Closed ArvinZaker closed 1 year ago
There is no more "HTSeq - Counts"
in GDC just STAR-counts
.
For TCGA data, you should be using the unstranded
assay
.
You can use the unstranded raw counts column.
dataPrep <- TCGAanalyze_Preprocessing( object = dataPrep, cor.cut = 0.6, datatype = "unstranded" )
On Thu, Jul 21, 2022 at 2:33 PM ArvinZaker @.***> wrote:
Hello,
I was following the tutorial provided on the TCGAbiolinks website https://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/analysis.html#HTSeq_data:_Downstream_analysis_BRCA to download the HTSeq data from TCGA Prostate Adenocarcinoma dataset.
The code I ran was the same as in the tutorial, except for the project which was changed to TCGA-PRAD:
library(TCGAbiolinks) query <- GDCquery( project = "TCGA-PRAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts" )
samplesDown <- getResults(query,cols=c("cases"))
dataSmTP <- TCGAquery_SampleTypes( barcode = samplesDown, typesample = "TP" )
dataSmNT <- TCGAquery_SampleTypes( barcode = samplesDown, typesample = "NT" ) dataSmTP_short <- dataSmTP[1:10] dataSmNT_short <- dataSmNT[1:10]
query.selected.samples <- GDCquery( project = "TCGA-PRAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts", barcode = c(dataSmTP_short, dataSmNT_short) )
GDCdownload( query = query.selected.samples )
dataPrep <- GDCprepare( query = query.selected.samples, save = TRUE )
dataPrep <- TCGAanalyze_Preprocessing( object = dataPrep, cor.cut = 0.6, datatype = "HTSeq - Counts" )
Executing the TCGAanalyze_Preprocessing function results in an error that mentions that HTSeq data was not provided:
Error in TCGAanalyze_Preprocessing(object = dataPrep, cor.cut = 0.6, datatype = "HTSeq - Counts") : HTSeq - Counts not found in the assay list: unstranded, stranded_first, stranded_second, tpm_unstrand, fpkm_unstrand, fpkm_uq_unstrand Please set the correct datatype argument.
I would like to know what changes are needed to correct this error and extract the HTSeq - count data.
Here is the sessionInfo() :
R version 4.2.1 (2022-06-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Arch Linux
Matrix products: default BLAS: /usr/lib/libblas.so.3.10.1 LAPACK: /usr/lib/liblapack.so.3.10.1
locale: [1] LC_CTYPE=en_US.UTF-8 [2] LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 [8] LC_NAME=C [9] LC_ADDRESS=C [10] LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 [12] LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices [4] utils datasets methods [7] base
other attached packages: [1] TCGAbiolinks_2.24.3
loaded via a namespace (and not attached): [1] Rcpp_1.0.9 [2] lattice_0.20-45 [3] tidyr_1.2.0 [4] prettyunits_1.1.1 [5] png_0.1-7 [6] Biostrings_2.64.0 [7] assertthat_0.2.1 [8] digest_0.6.29 [9] utf8_1.2.2 [10] BiocFileCache_2.4.0 [11] plyr_1.8.7 [12] R6_2.5.1 [13] GenomeInfoDb_1.32.2 [14] stats4_4.2.1 [15] RSQLite_2.2.15 [16] httr_1.4.3 [17] ggplot2_3.3.6 [18] pillar_1.8.0 [19] zlibbioc_1.42.0 [20] rlang_1.0.4 [21] progress_1.2.2 [22] curl_4.3.2 [23] data.table_1.14.2 [24] rstudioapi_0.13 [25] blob_1.2.3 [26] S4Vectors_0.34.0 [27] Matrix_1.4-1 [28] downloader_0.4 [29] readr_2.1.2 [30] stringr_1.4.0 [31] RCurl_1.98-1.7 [32] bit_4.0.4 [33] biomaRt_2.52.0 [34] munsell_0.5.0 [35] DelayedArray_0.22.0 [36] xfun_0.31 [37] compiler_4.2.1 [38] pkgconfig_2.0.3 [39] BiocGenerics_0.42.0 [40] tidyselect_1.1.2 [41] SummarizedExperiment_1.26.1 [42] KEGGREST_1.36.3 [43] tibble_3.1.7 [44] GenomeInfoDbData_1.2.8 [45] IRanges_2.30.0 [46] matrixStats_0.62.0 [47] XML_3.99-0.10 [48] fansi_1.0.3 [49] dbplyr_2.2.1 [50] crayon_1.5.1 [51] dplyr_1.0.9 [52] tzdb_0.3.0 [53] rappdirs_0.3.3 [54] bitops_1.0-7 [55] grid_4.2.1 [56] jsonlite_1.8.0 [57] gtable_0.3.0 [58] lifecycle_1.0.1 [59] DBI_1.1.3 [60] magrittr_2.0.3 [61] scales_1.2.0 [62] cli_3.3.0 [63] TCGAbiolinksGUI.data_1.16.0 [64] stringi_1.7.8 [65] cachem_1.0.6 [66] XVector_0.36.0 [67] xml2_1.3.3 [68] filelock_1.0.2 [69] ellipsis_0.3.2 [70] generics_0.1.3 [71] vctrs_0.4.1 [72] tools_4.2.1 [73] bit64_4.0.5 [74] Biobase_2.56.0 [75] glue_1.6.2 [76] purrr_0.3.4 [77] hms_1.1.1 [78] MatrixGenerics_1.8.1 [79] fastmap_1.1.0 [80] AnnotationDbi_1.58.0 [81] colorspace_2.0-3 [82] GenomicRanges_1.48.0 [83] rvest_1.0.2 [84] memoise_2.0.1 [85] knitr_1.39
— Reply to this email directly, view it on GitHub https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/527, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQ6L4DQV43ORA45PJTPLVVGCUZANCNFSM54IN5ZSA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you for the information! The issue is resolved!
Hello,
I was following the tutorial provided on the TCGAbiolinks website to download the HTSeq data from TCGA Prostate Adenocarcinoma dataset.
The code I ran was the same as in the tutorial, except for the
project
which was changed toTCGA-PRAD
:Executing the
TCGAanalyze_Preprocessing
function results in an error that mentions that HTSeq data was not provided:I would like to know what changes are needed to correct this error and extract the HTSeq - count data.
Here is the
sessionInfo()
: