Closed nbxszby416 closed 3 years ago
Sorry but I have solved the first problem!! But the second problem still exists.
I'll really appreciate it if you can help.
Hi, can you be more specific about which pbmc dataset you used that gave different results?
I used the v1 pbmc dataset (Cell Ranger 1.1.0 from 10X genomics), which is the combination of these 10 pure types (CD14+ Monocytes, CD19+ B cells, CD34+ cells, CD4+ Helper T cells, CD4+/CD25+ Regulatory T cells, CD4+/CD45RA+/CD25− Naive T cells, CD4+/CD45RO+ Memory T cells,CD56+ Natural killer cells, CD8+ Cytotoxic T cells and CD8+/CD45RA+ Naive cytotoxic T cells).
Sorry for the interuption, but I found that I have not solved the first problem... I want to change a little part of the tfidf function and test the result, but meet: Error: is(object = cds, class2 = "CellDataSet") is not TRUE It seems that even when I source the original "train_cell_classifier.R" file, it doesn't work.. What should I do?
Thanks for your work!!
Hi, it seems that the first problem is because Monocle3 only accept "CellDataSet", and I transfered to Monocle and did my test smoothly!
I believe this is resolved? If not, please reopen
Hi, I have 2 questions here: 1> I tried to use the function “train_cell_classifier” to train my classifier, but I have this error: Error: is(object = cds, class2 = "CellDataSet") is not TRUE It really confused me since the functions of “check_markers”, “classify_cells”, etc. are all feasible, and actually “train_cell_classifier” WAS feasible before I changed parts of “train_cell_classifier.R” and source it. But I have sourced the original function back, and it doesn’t work…
My sessionInfo() is: "R version 4.0.3 (2020-10-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base
other attached packages: [1] org.Hs.eg.db_3.12.0 AnnotationDbi_1.52.0
[3] garnett_0.2.17 monocle3_0.2.3.0
[5] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0 [7] GenomicRanges_1.42.0 GenomeInfoDb_1.26.2
[9] IRanges_2.24.1 S4Vectors_0.28.1
[11] MatrixGenerics_1.2.0 matrixStats_0.57.0
[13] Biobase_2.50.0 BiocGenerics_0.36.0
loaded via a namespace (and not attached): [1] fs_1.5.0 bitops_1.0-6 usethis_2.0.0
[4] devtools_2.3.2 bit64_4.0.5 doParallel_1.0.16
[7] rprojroot_2.0.2 tools_4.0.3 R6_2.5.0
[10] DBI_1.1.0 colorspace_2.0-0 withr_2.3.0
[13] prettyunits_1.1.1 processx_3.4.5 tidyselect_1.1.0
[16] gridExtra_2.3 curl_4.3 bit_4.0.4
[19] compiler_4.0.3 glmnet_4.0-2 cli_2.2.0
[22] formatR_1.7 desc_1.2.0 DelayedArray_0.16.0
[25] labeling_0.4.2 scales_1.1.1 callr_3.5.1
[28] stringr_1.4.0 digest_0.6.27 rmarkdown_2.6
[31] XVector_0.30.0 pkgconfig_2.0.3 htmltools_0.5.0
[34] sessioninfo_1.1.1 rlang_0.4.10 rstudioapi_0.13
[37] RSQLite_2.2.1 shape_1.4.5 generics_0.1.0
[40] farver_2.0.3 dplyr_1.0.2 RCurl_1.98-1.2
[43] magrittr_2.0.1 GenomeInfoDbData_1.2.4 futile.logger_1.4.3
[46] Matrix_1.3-0 Rcpp_1.0.5 munsell_0.5.0
[49] fansi_0.4.1 viridis_0.5.1 lifecycle_0.2.0
[52] stringi_1.5.3 yaml_2.2.1 zlibbioc_1.36.0
[55] pkgbuild_1.2.0 plyr_1.8.6 grid_4.0.3
[58] blob_1.2.1 ggrepel_0.9.0 forcats_0.5.0
[61] crayon_1.3.4 lattice_0.20-41 splines_4.0.3
[64] ps_1.5.0 knitr_1.30 pillar_1.4.7
[67] igraph_1.2.6 pkgload_1.1.0 reshape2_1.4.4
[70] codetools_0.2-18 futile.options_1.0.1 glue_1.4.2
[73] evaluate_0.14 remotes_2.2.0 lambda.r_1.2.4
[76] BiocManager_1.30.10 vctrs_0.3.6 foreach_1.5.1
[79] testthat_3.0.1 gtable_0.3.0 purrr_0.3.4
[82] assertthat_0.2.1 ggplot2_3.3.3 xfun_0.19
[85] survival_3.2-7 viridisLite_0.3.0 tibble_3.0.4
[88] rly_1.6.2 iterators_1.0.13 tinytex_0.28
[91] memoise_1.1.0 ellipsis_0.3.1
" And my cds is: "class: cell_data_set dim: 6 94655 metadata(1): cds_version assays(1): counts rownames(6): ENSG00000243485 ENSG00000237613 ... ENSG00000239945 ENSG00000237683 rowData names(2): name0 gene_short_name colnames(94655): AAACATACAAAACG-1_1 AAACATACACGACT-1_1 ... TTTGCATGCGTTGA-1_10 TTTGCATGTGTCCC-1_10 colData names(5): TSNE.1 TSNE.2 FACS_type Size_Factor sample reducedDimNames(0): altExpNames(0): “
2> I ran the whole work on 10X PBMCs with 10 mixed dataset(just as mentioned in the paper), but the column of cell_type (prediction) showed >50% “Unknown” (most of which are CD4 T cells and CD8 T cells). I used the same marker file as provided, and I don’t know why(since it is only 26% of unclassified cells in the paper). Is there anything wrong in my code? I delete "Dendritic cells” in my marker file since there are no Dendritic cells in this dataset. (I also use the version with "Dendritic cells” but it doesn’t help) There are no markers >0.5 ambiguity. The cluster-extended type prediction is OK. “ fold_name <- c("regulatory_t", "naive_cytotoxic", "memory_t", "cd14_monocytes", "cytotoxic_t", "b_cells", "cd4_t_helper", "cd34", "cd56_nk", "naive_t") cell_name <- c("CD4 T cells", "CD8 T cells", "CD4 T cells", "Monocytes", "CD8 T cells", "B cells", "CD4 T cells", "CD34+", "NK cells", "CD4 T cells")
matrix1 <- Matrix::readMM(paste0(“/xx/",fold_name[1],"/matrix.mtx"))
pdata
pdata1 <- read.csv(paste0(“/xx/",fold_name[1],"/projection.csv"), header=TRUE, sep=",") rownames(pdata1) <- pdata1[,1] pdata1 <- pdata1[,-1] pdata1 <- data.frame(pdata1,FACS_type=c(cell_name[1]))
fdata
fdata1 <- read.table(paste0(“/xx/",fold_name[1],"/genes.tsv"), header=FALSE, sep="\t") rownames(fdata1) <- fdata1[,1]
rename
row.names(matrix1) <- row.names(fdata1) colnames(matrix1) <- row.names(pdata1) names(fdata1) <- c("name0", "gene_short_name")
cds
cds1 <- new_cell_data_set(as(matrix1, "dgCMatrix"), cell_metadata = pdata1, gene_metadata = fdata1) x <- list(cds1)
for (i in 2:10){ matrix <- Matrix::readMM(paste0(“/xx/",fold_name[i],"/matrix.mtx"))
pdata
pdata <- read.csv(paste0(“/xx/",fold_name[i],"/projection.csv"), header=TRUE, sep=",") rownames(pdata) <- pdata[,1] pdata <- pdata[,-1] pdata <- data.frame(pdata,FACS_type=c(cell_name[i]))
fdata
fdata <- read.table(paste0("/xx/",fold_name[i],"/genes.tsv"), header=FALSE, sep="\t") rownames(fdata) <- fdata[,1]
rename
row.names(matrix) <- row.names(fdata) colnames(matrix) <- row.names(pdata) names(fdata) <- c("name0", "gene_short_name")
cds
cds0 <- new_cell_data_set(as(matrix, "dgCMatrix"), cell_metadata = pdata, gene_metadata = fdata) x <- c(x, cds0) } cds <- combine_cds(x)
library(org.Hs.eg.db) marker_file <- “/xx/pbmc_markers.txt" pbmc_classifier <- train_cell_classifier(cds=cds, marker_file = marker_file, db=org.Hs.eg.db, cds_gene_id_type = "ENSEMBL", num_unknown = 500, marker_file_gene_id_type = "SYMBOL”)
cds <- classify_cells(cds, pbmc_classifier, db = org.Hs.eg.db, cluster_extend = TRUE, cds_gene_id_type = "ENSEMBL") table(pData(cds)$cell_type) B cells CD34+ CD4 T cells CD8 T cells Dendritic cells 9391 3646 8171 7690 653 Monocytes NK cells T cells Unknown 2560 5370 12873 44301 “
Thank you for your hard work!