DavisLaboratory / standR

Spatial transcriptomics analyses and decoding in R
https://davislaboratory.github.io/standR/
Other
18 stars 4 forks source link

standR Workflow for Cancer Transcriptome Atlas (CTA) #27

Open ErickMUO opened 9 months ago

ErickMUO commented 9 months ago

Hey standR team!

I am reaching out with a question regarding the application of the standR workflow to the Cancer Transcriptome Atlas (CTA). I am encountering challenges while attempting to utilize the readGeoMx import function, and I believe it may be related to the specific characteristics of the CTA dataset.

The issue revolves around the presence of several probes for the same gene within the CTA, leading to errors in the import process. I am seeking guidance on whether the standR workflow is compatible with the CTA dataset, especially considering the multiple probes for individual genes.

Is there any way I can circumvent this issue?

Thank you in advance for your time and assistance.

spe <- readGeoMx(countFile, sampleAnnoFile, featureAnnoFile,NegProbeName = "Negative Probe")

Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning messages: 1: Setting row names on a tibble is deprecated. 2: non-unique values when setting 'row.names': ‘A2M’, ‘ABCB1’, ‘ABCF1’, ‘ABL1’, ‘ACOT12’, ‘ACSF3’, ‘ACTA2’, ‘ACTB’, ‘ACTR3B’, ‘ACVR1B’, ‘ACVR1C’, ‘ACVR2A’, ‘ACY1’, ‘ADA’, ‘ADAM12’, ‘ADGRE1’, ‘ADGRE5’, ‘ADH1A/B/C’, ‘ADH4’, ‘ADH6’, ‘ADM’, ‘AFDN’, ‘AICDA’, ‘AIRE’, ‘AKAP1’, ‘AKR1C4’, ‘AKT1’, ‘AKT2’, ‘AKT3’, ‘ALCAM’, ‘ALDOA’, ‘ALDOC’, ‘ALK’, ‘ALKBH2’, ‘ALKBH3’, ‘AMBP’, ‘AMER1’, ‘AMH’, ‘ANGPT1’, ‘ANGPT2’, ‘ANGPTL4’, ‘ANKRD28’, ‘ANLN’, ‘ANP32B’, ‘ANXA1’, ‘APC’, ‘APH1B’, ‘API5’, ‘APLNR’, ‘APOA1’, ‘APOA2’, ‘APOA4’, ‘APOB’, ‘APOC2’, ‘APOC3’, ‘APOE’, ‘APOL6’, ‘APOM’, ‘APP’, ‘AQP9’, ‘AR’, ‘AREG’, ‘ARG1’, ‘ARG2’, ‘ARID1A’, ‘ARID1B’, ‘ARID2’, ‘ARMH3’, ‘ARNT’, ‘ARNT2’, ‘ASCL1’, ‘ASL’, ‘ASNS’, ‘ASPA’, ‘ASPG’, [... truncated]

sessionInfo() R version 4.3.0 (2023-04-21) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS 14.2.1

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Stockholm tzcode source: internal

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] readxl_1.4.3 lubridate_1.9.3 forcats_1.0.0
[4] stringr_1.5.1 purrr_1.0.2 readr_2.1.4
[7] tidyr_1.3.0 tidyverse_2.0.0 standR_1.5.4
[10] tibble_3.2.1 ggforce_0.4.1 dplyr_1.1.3
[13] GeoMxWorkflows_1.8.0 GeomxTools_3.5.0 NanoStringNCTools_1.10.0 [16] ggplot2_3.4.4 S4Vectors_0.40.1 Biobase_2.62.0
[19] BiocGenerics_0.48.1

loaded via a namespace (and not attached): [1] IRanges_2.36.0 vroom_1.6.4 progress_1.2.2
[4] pacman_0.5.1 vsn_3.70.0 goftest_1.2-3
[7] Biostrings_2.70.1 vctrs_0.6.4 spatstat.random_3.2-1
[10] digest_0.6.33 png_0.1-8 ggrepel_0.9.4
[13] deldir_1.0-9 parallelly_1.36.0 MASS_7.3-60
[16] reshape_0.8.9 reshape2_1.4.4 httpuv_1.6.12
[19] foreach_1.5.2 qvalue_2.34.0 withr_2.5.2
[22] xfun_0.41 ggfun_0.1.3 ellipsis_0.3.2
[25] survival_3.5-7 memoise_2.0.1 ggbeeswarm_0.7.2
[28] clusterProfiler_4.10.0 gson_0.1.0 systemfonts_1.0.5
[31] tidytree_0.4.5 zoo_1.8-12 pbapply_1.7-2
[34] GGally_2.1.2 prettyunits_1.2.0 KEGGREST_1.42.0
[37] promises_1.2.1 httr_1.4.7 restfulr_0.0.15
[40] globals_0.16.2 fitdistrplus_1.1-11 rstudioapi_0.15.0
[43] miniUI_0.1.1.1 generics_0.1.3 DOSE_3.28.0
[46] ggalluvial_0.12.5 reactome.db_1.86.0 curl_5.1.0
[49] zlibbioc_1.48.0 ggraph_2.1.0 polyclip_1.10-6
[52] GenomeInfoDbData_1.2.11 ExperimentHub_2.10.0 SparseArray_1.2.2
[55] interactiveDisplayBase_1.40.0 xtable_1.8-4 evaluate_0.23
[58] S4Arrays_1.2.0 BiocFileCache_2.10.1 preprocessCore_1.64.0
[61] hms_1.1.3 GenomicRanges_1.54.1 irlba_2.3.5.1
[64] colorspace_2.1-0 filelock_1.0.2 ROCR_1.0-11
[67] reticulate_1.34.0 spatstat.data_3.0-3 magrittr_2.0.3
[70] lmtest_0.9-40 later_1.3.1 viridis_0.6.4
[73] ggtree_3.10.0 lattice_0.22-5 spatstat.geom_3.2-7
[76] future.apply_1.11.0 scattermore_1.2 XML_3.99-0.15
[79] shadowtext_0.1.2 cowplot_1.1.1 matrixStats_1.1.0
[82] RcppAnnoy_0.0.21 pillar_1.9.0 nlme_3.1-163
[85] iterators_1.0.14 compiler_4.3.0 RSpectra_0.16-1
[88] stringi_1.8.1 minqa_1.2.6 tensor_1.5
[91] SummarizedExperiment_1.32.0 GenomicAlignments_1.38.0 MPO.db_0.99.7
[94] plyr_1.8.9 crayon_1.5.2 abind_1.4-5
[97] BiocIO_1.12.0 gridGraphics_0.5-1 locfit_1.5-9.8
[100] sp_2.1-1 graphlayouts_1.0.2 bit_4.0.5
[103] fastmatch_1.1-4 codetools_0.2-19 openssl_2.1.1
[106] plotly_4.10.3 mime_0.12 ff_4.0.9
[109] splines_4.3.0 Rcpp_1.0.11 fastDummies_1.7.3
[112] dbplyr_2.4.0 sparseMatrixStats_1.14.0 HDO.db_0.99.1
[115] cellranger_1.1.0 knitr_1.45 blob_1.2.4
[118] utf8_1.2.4 BiocVersion_3.18.0 lme4_1.1-35.1
[121] fs_1.6.3 listenv_0.9.0 oligo_1.66.0
[124] DelayedMatrixStats_1.24.0 ggplotify_0.1.2 Matrix_1.6-3
[127] statmod_1.5.0 tzdb_0.4.0 tweenr_2.0.2
[130] pkgconfig_2.0.3 pheatmap_1.0.12 tools_4.3.0
[133] cachem_1.0.8 RSQLite_2.3.3 numDeriv_2016.8-1.1
[136] viridisLite_0.4.2 DBI_1.1.3 celldex_1.12.0
[139] graphite_1.48.0 rmarkdown_2.25 fastmap_1.1.1
[142] scales_1.2.1 grid_4.3.0 outliers_0.15
[145] ica_1.0-3 Seurat_5.0.0 Rsamtools_2.18.0
[148] AnnotationHub_3.10.0 patchwork_1.1.3 BiocManager_1.30.22
[151] dotCall64_1.1-0 graph_1.80.0 RANN_2.6.1
[154] farver_2.1.1 tidygraph_1.2.3 scatterpie_0.2.1
[157] yaml_2.3.7 MatrixGenerics_1.14.0 ggthemes_4.2.4
[160] rtracklayer_1.62.0 cli_3.6.1 leiden_0.4.3
[163] lifecycle_1.0.4 askpass_1.2.0 uwot_0.1.16
[166] BiocParallel_1.36.0 MeSHDbi_1.38.0 timechange_0.2.0
[169] gtable_0.3.4 rjson_0.2.21 umap_0.2.10.0
[172] ggridges_0.5.4 progressr_0.14.0 parallel_4.3.0
[175] ape_5.7-1 limma_3.58.1 jsonlite_1.8.7
[178] RcppHNSW_0.5.0 affxparser_1.74.0 bitops_1.0-7
[181] progeny_1.24.0 HPO.db_0.99.2 bit64_4.0.5
[184] depmap_1.16.0 Rtsne_0.16 yulab.utils_0.1.0
[187] ReactomePA_1.46.0 spatstat.utils_3.0-4 SeuratObject_5.0.0
[190] GOSemSim_2.28.0 lazyeval_0.2.2 shiny_1.7.5.1
[193] htmltools_0.5.7 affy_1.80.0 enrichplot_1.22.0
[196] GO.db_3.18.0 sctransform_0.4.1 rappdirs_0.3.3
[199] glue_1.6.2 spam_2.10-0 XVector_0.42.0
[202] RCurl_1.98-1.13 treeio_1.26.0 gridExtra_2.3
[205] EnvStats_2.8.1 boot_1.3-28.1 igraph_1.5.1
[208] R6_2.5.1 SingleCellExperiment_1.24.0 DESeq2_1.42.0
[211] ggiraph_0.8.7 cluster_2.1.4 aplot_0.2.2
[214] GenomeInfoDb_1.38.1 nloptr_2.0.3 DelayedArray_0.28.0
[217] tidyselect_1.2.0 vipor_0.4.5 xml2_1.3.5
[220] oligoClasses_1.64.0 AnnotationDbi_1.64.1 future_1.33.0
[223] munsell_0.5.0 KernSmooth_2.23-22 BiocStyle_2.30.0
[226] affyio_1.72.0 data.table_1.14.8 htmlwidgets_1.6.2
[229] fgsea_1.28.0 RColorBrewer_1.1-3 biomaRt_2.58.0
[232] rlang_1.1.2 spatstat.sparse_3.0-3 spatstat.explore_3.2-5
[235] lmerTest_3.1-3 uuid_1.1-1 fansi_1.0.5
[238] beeswarm_0.4.0

ningbioinfo commented 9 months ago

Hi @ErickMUO , as long as you're using the "probeQC count" as the count input, then the negative probes can be removed from the data in most cases since they were used to generate the "probeQC count", and this is essentially what the function is doing. In the current version unfortunately, the best way to do it is change the name of the negative probes beforehand and pass the names of the negative probes to the data. I'll mark this as an improvement so that the function can recognise this types of error and address for the users automatically (with on and off switch of course). This will probably happen in the next couple weeks.

ErickMUO commented 9 months ago

That sounds fantastic! I'll give the recommendations a try and also explore the updated version. Thank you!