GIS-SP-Group / RCA

R package for robust clustering of single cell RNA sequencing data
MIT License
38 stars 24 forks source link

Trouble to reproduce the RCA as the Nature Genetics publication #8

Open Tongtong-Wang opened 5 years ago

Tongtong-Wang commented 5 years ago

Hi, Li

Thanks for such great paper and package: I’m having trouble to reproduce the CAFs PCA plot that you have presented in the paper (Fig 5e). Following the description “We therefore used RCA in self-projection mode to cluster CAFs and normal mucosa fibroblasts and identified three clusters of fibroblast cells” I have only generated what looks like in the attachment. I have subset the fpkm value with fibroblast labels, and did dataConstruct GeneFilt, CellNormalize DataTransform, and featureConstruc with method of “SelfProjection”.

Would you be able to help with this?

What I've done so far:

library(WGCNA)
library(flashClust)
library(gplots)
library(preprocessCore)
library(RCA)
library(dplyr)
library(ggplot2)
options(stringsAsFactors = FALSE)

dat = data.table::fread("GSE81861_CRC_NM_all_cells_FPKM.csv")
colnames(dat)[2:ncol(dat)] <- paste0(colnames(dat)[2:ncol(dat)],"__N")
dat_T = data.table::fread("GSE81861_CRC_tumor_all_cells_FPKM.csv")
colnames(dat_T)[2:ncol(dat_T)] <- paste0(colnames(dat_T)[2:ncol(dat_T)],"__T")

dat <- left_join(dat, dat_T)
dat_caf <- dat[,c(1,grep("_Fibroblast_",colnames(dat)))]

rn <- dat_caf$V1

fpkm_data = dat_caf %>%
  as.data.frame() %>%
  magrittr::set_rownames(rn) %>%
  dplyr::select(-V1)

color_to_use0 = colnames(fpkm_data)
color_to_use0 <- strsplit(color_to_use0,"__")
color_to_use <- paste("",lapply(color_to_use0,"[",3),sep="")

tissue <- paste("",lapply(color_to_use0,"[",4),sep="")
patient <- paste("",lapply(color_to_use0,"[",1),sep="")

rm(data_obj)

data_obj = dataConstruct(fpkm_data)
data_obj = geneFilt(obj_in = data_obj)
data_obj = cellNormalize(data_obj)
data_obj = dataTransform(data_obj,"log10")
data_obj = featureConstruct(data_obj,method = "SelfProjection")

data_obj = cellClust(data_obj, deepSplit_wgcna = 4)

RCAPlot(data_obj,cluster_color_labels = color_to_use)

`R version 3.5.2 (2018-12-20) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 [4] LC_NUMERIC=C LC_TIME=English_Australia.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ggplot2_3.1.0 dplyr_0.8.0.1 RCA_1.0 preprocessCore_1.44.0 gplots_3.0.1.1
[6] flashClust_1.01-2 WGCNA_1.66 fastcluster_1.1.25 dynamicTreeCut_1.63-1

loaded via a namespace (and not attached): [1] bitops_1.0-6 matrixStats_0.54.0 robust_0.4-18
[4] fit.models_0.5-14 bit64_0.9-7 doParallel_1.0.14
[7] RColorBrewer_1.1-2 GenomeInfoDb_1.18.2 tools_3.5.2
[10] backports_1.1.3 R6_2.4.0 vipor_0.4.5
[13] HDF5Array_1.10.1 rpart_4.1-13 KernSmooth_2.23-15
[16] Hmisc_4.2-0 DBI_1.0.0 lazyeval_0.2.1
[19] BiocGenerics_0.28.0 colorspace_1.4-0 nnet_7.3-12
[22] withr_2.1.2 tidyselect_0.2.5 gridExtra_2.3
[25] bit_1.1-14 compiler_3.5.2 Biobase_2.42.0
[28] BiocNeighbors_1.0.0 htmlTable_1.13.1 DelayedArray_0.8.0
[31] labeling_0.3 caTools_1.17.1.2 scales_1.0.0
[34] checkmate_1.9.1 DEoptimR_1.0-8 mvtnorm_1.0-10
[37] robustbase_0.93-3 stringr_1.4.0 digest_0.6.18
[40] foreign_0.8-71 XVector_0.22.0 scater_1.10.1
[43] rrcov_1.4-7 base64enc_0.1-3 pkgconfig_2.0.2
[46] htmltools_0.3.6 limma_3.38.3 readxl_1.3.0
[49] htmlwidgets_1.3 rlang_0.3.1 rstudioapi_0.9.0
[52] RSQLite_2.1.1 impute_1.56.0 DelayedMatrixStats_1.4.0
[55] BiocParallel_1.16.6 gtools_3.8.1 acepack_1.4.1
[58] RCurl_1.95-4.12 magrittr_1.5 GenomeInfoDbData_1.2.0
[61] GO.db_3.7.0 Formula_1.2-3 Matrix_1.2-16
[64] ggbeeswarm_0.6.0 Rhdf5lib_1.4.2 Rcpp_1.0.0
[67] munsell_0.5.0 S4Vectors_0.20.1 viridis_0.5.1
[70] edgeR_3.24.3 stringi_1.3.1 yaml_2.2.0
[73] zlibbioc_1.28.0 MASS_7.3-51.1 SummarizedExperiment_1.12.0 [76] rhdf5_2.26.2 plyr_1.8.4 grid_3.5.2
[79] blob_1.1.1 parallel_3.5.2 gdata_2.18.0
[82] crayon_1.3.4 lattice_0.20-38 splines_3.5.2
[85] locfit_1.5-9.1 knitr_1.22 pillar_1.3.1
[88] igraph_1.2.4 GenomicRanges_1.34.0 reshape2_1.4.3
[91] codetools_0.2-16 stats4_3.5.2 glue_1.3.0
[94] latticeExtra_0.6-28 scran_1.10.2 data.table_1.12.0
[97] foreach_1.4.4 cellranger_1.1.0 gtable_0.2.0
[100] purrr_0.3.1 tidyr_0.8.3 assertthat_0.2.0
[103] xfun_0.5 viridisLite_0.3.0 survival_2.43-3
[106] pcaPP_1.9-73 SingleCellExperiment_1.4.1 tibble_2.0.1
[109] iterators_1.0.10 beeswarm_0.2.3 AnnotationDbi_1.44.0
[112] memoise_1.1.0 IRanges_2.16.0 cluster_2.0.7-1
[115] statmod_1.4.30`