Bioconductor / GenomicDataCommons

Provide R access to the NCI Genomic Data Commons portal.
http://bioconductor.github.io/GenomicDataCommons/
83 stars 23 forks source link

HTSeq - Counts removed from the data release #100

Closed LiNk-NY closed 2 years ago

LiNk-NY commented 2 years ago

I think the alternative is to use STAR - Counts. Though there are some files that require a token. See the manifest chunk in https://github.com/Bioconductor/GenomicDataCommons/blob/master/vignettes/overview.Rmd

seandavi commented 2 years ago

Thanks for closing.

bhagwataditya commented 1 year ago

I stumbled into something similar. Replacing HTSeq - Counts with STAR - Counts makes it work for me.

require(magrittr)
y <- GenomicDataCommons::files()
y %<>% GenomicDataCommons::filter(cases.project.project_id == 'TCGA-BRCA')
y %<>% GenomicDataCommons::filter(type == 'gene_expression')
# y%<>%GenomicDataCommons::filter(analysis.workflow_type == 'HTSeq - Counts')          # this no longer works
y %<>% GenomicDataCommons::filter(analysis.workflow_type == 'STAR - Counts')           # but this does, thankyou
LiNk-NY commented 1 year ago

Hi Aditya, @bhagwataditya HTSeq Counts have been removed. Please see the data release notes.

And what would be these? STAR - Counts also returns no results for me.

Did you try the example code in the vignette?

files() |>
    filter( cases.project.project_id == 'TCGA-BRCA') |> 
    filter( type == 'gene_expression' ) |>
    filter( analysis.workflow_type == 'STAR - Counts')  |>
    manifest() |>
    head()
bhagwataditya commented 1 year ago

Thankyou very much Marcel! That works indeed, I have updated the example for others' reference. Small question: is there quick function that shows the fields on which filtering can be performed? Other question: which method has replaced HTSeq for counting aligned reads?

LiNk-NY commented 1 year ago

Hi Aditya, @bhagwataditya Please use https://support.bioconductor.org for software-use related questions.

You can use available_values :

> available_values("files", "experimental_strategy")
 [1] "WXS"                         "RNA-Seq"                    
 [3] "Targeted Sequencing"         "Genotyping Array"           
 [5] "miRNA-Seq"                   "Methylation Array"          
 [7] "WGS"                         "Tissue Slide"               
 [9] "Diagnostic Slide"            "Reverse Phase Protein Array"
[11] "ATAC-Seq"                    "scRNA-Seq"                  
[13] "_missing" 

See the STAR paper here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530905/

bhagwataditya commented 1 year ago

Thankyou very much!