BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
286 stars 109 forks source link

CPTAC-3 GDCprepare error. missing sample.submitter_id #573

Open fkgruber opened 1 year ago

fkgruber commented 1 year ago

I did

library(TCGAbiolinks)
query_cnv <- GDCquery(
    project = "CPTAC-3",
    data.category = "Copy Number Variation",
    data.type = "Gene Level Copy Number"
)
GDCdownload(query_cnv)
cnvdf <- GDCprepare(query = query_cnv)

and got the error:

Error in ans[npos] <- rep(no, length.out = len)[npos] : replacement has length zero In addition: Warning message: In rep(no, length.out = len) : 'x' is NULL so the result will be NULL

Debugging GDCprepare found out the error is coming from the line

 cases <- ifelse(grepl("TCGA|TARGET|CGCI-HTMCP-CC", query$results[[1]]$project %>% 
    unlist()), query$results[[1]]$cases, query$results[[1]]$sample.submitter_id)

because there is no sample.submitter_id on the query object:

query_cnv$results[[1]] %>% names
 [1] "id"                        "data_format"              
 [3] "cases"                     "access"                   
 [5] "file_name"                 "submitter_id"             
 [7] "data_category"             "type"                     
 [9] "platform"                  "file_size"                
[11] "created_datetime"          "md5sum"                   
[13] "updated_datetime"          "file_id"                  
[15] "data_type"                 "state"                    
[17] "experimental_strategy"     "version"                  
[19] "data_release"              "project"                  
[21] "analysis_id"               "analysis_state"           
[23] "analysis_submitter_id"     "analysis_workflow_link"   
[25] "analysis_workflow_type"    "analysis_workflow_version"
[27] "sample_type"              
git-jrwang commented 10 months ago

Same error here in CPTAC-2,

When I do snp_Query_Data <- GDCquery( project = "CPTAC-2", data.category = "Simple Nucleotide Variation", data.type = "Masked Somatic Mutation", access = "open" )

GDCdownload(query=snp_Query_Data, method = "api", directory = DataDir, files.per.chunk = 50)

snp_data_CPTAC2 <- GDCprepare(query = snp_Query_Data, directory = DataDir, save = TRUE, save.filename = "CPTAC2_SNP_data.rda")

ollemonrasl commented 9 months ago

Got the same error. Does anyone know how to fix it?

ChiaraCaprioli commented 7 months ago

Hi,

Thanks a lot for this convenient package. I'm experiencing the same error with project BEATAML1.0-COHORT, even after reinstalling TCGAbiolinks with last updates:

devtools::install_github(repo = "BioinformaticsFMRP/TCGAbiolinks", ref = "master")
library(TCGAbiolinks)

query <- GDCquery(
  project = "BEATAML1.0-COHORT", 
  data.category = "Simple Nucleotide Variation", 
  access = "open",
  data.type = "Masked Somatic Mutation", 
  workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking"
)
GDCdownload(query)
maf <- GDCprepare(query)

Error in ans[npos] <- rep(no, length.out = len)[npos] : replacement has length zero In addition: Warning message: In rep(no, length.out = len) : 'x' is NULL so the result will be NULL

tiagochst commented 7 months ago

Thank you for the bug report.

Could you try this version: devtools::install_github(repo = "BioinformaticsFMRP/TCGAbiolinks", ref = "devel")

Best regards, Tiago Chedraoui Silva

On Mon, Nov 27, 2023 at 12:34 PM ChiaraCaprioli @.***> wrote:

Hi,

Thanks a lot for this convenient package. I'm experiencing the same error with project BEATAML1.0-COHORT, even after reinstalling TCGAbiolinks with last updates:

devtools::install_github(repo = "BioinformaticsFMRP/TCGAbiolinks", ref = "master") library(TCGAbiolinks)

query <- GDCquery( project = "BEATAML1.0-COHORT", data.category = "Simple Nucleotide Variation", access = "open", data.type = "Masked Somatic Mutation", workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking" ) GDCdownload(query) maf <- GDCprepare(query)

Error in ans[npos] <- rep(no, length.out = len)[npos] : replacement has length zero In addition: Warning message: In rep(no, length.out = len) : 'x' is NULL so the result will be NULL

— Reply to this email directly, view it on GitHub https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/573#issuecomment-1828066388, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQ6JIKR7A2CPH4KB7CM3YGSXHRAVCNFSM6AAAAAAXTKQS3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYGA3DMMZYHA . You are receiving this because you commented.Message ID: @.***>

ChiaraCaprioli commented 7 months ago

Thank you for the prompt response. It throws the same error.

tiagochst commented 7 months ago

Please, Could you check the version loaded please. It is working on my side https://rpubs.com/tiagochst/BEATAML_COHORT_MAF

On Mon, Nov 27, 2023 at 3:34 PM ChiaraCaprioli @.***> wrote:

Thank you for the prompt response. It throws the same error.

— Reply to this email directly, view it on GitHub https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/573#issuecomment-1828400767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQ6PWY7T6ZJS5AZ2P2QTYGTMLLAVCNFSM6AAAAAAXTKQS3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRYGQYDANZWG4 . You are receiving this because you commented.Message ID: @.***>

ChiaraCaprioli commented 7 months ago

After double-checking it works, thank you. However there is another error raised when trying to access maf files to associate mutation to expression data:

project = "BEATAML1.0-COHORT"

query_exp_project <- GDCquery(
 project = project,
 data.category = "Transcriptome Profiling",
 data.type = "Gene Expression Quantification", 
 workflow.type = "STAR - Counts"
)

  GDCdownload(
    query = query_exp_project, 
    directory = path_main
    )

  exp_project <- GDCprepare(
    query_exp_project, 
    save = T,
    directory = path_main,
    save.filename = paste(destdir, paste0(project, "_gex.RData"), sep = "/"),
    add.gistic2.mut = "SRSF2" # add info on SRSF2 mutational status 
  )

Starting to add information to samples => Add clinical information to samples Error in dplyr::bind_cols(): ! Can't recycle ..1 (size 0) to match ..2 (size 2). Run rlang::last_trace() to see where the error occurred.

I understand this is unrelated to the initial bug, however it seems specific to the BeatAML project because that's not happening with TCGA-LAML data.