BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
289 stars 110 forks source link

gdcprepare error: ‘==’ only defined for equally-sized data frames #485

Closed aastha-v closed 2 years ago

aastha-v commented 2 years ago

Hello!

I'm trying to work with hg38 RPPA data thus:

query_rppa = GDCquery(project = c("TCGA-COAD", "TCGA-READ"), data.category = "Proteome Profiling", experimental.strategy = "Reverse Phase Protein Array", platform = "RPPA", data.type = "Protein Expression Quantification")

GDCdownload(query_rppa)

data_rppa = GDCprepare(query_rppa)

I am able to download 37MB of data. However when I use GDCprepare, I get the following error:

gdcprepare_error

Could you help me figure this out please. Thanks!

tiagochst commented 2 years ago

The reason the code is breaking is that 17 files have a lower number of proteins (247) vs 487 from all other 478 fiels.

Screen Shot 2022-01-14 at 10 12 01 AM

I have to investigate if it is a data problem or not. If not, I need to change the code to include NA's for the missing protein levels.

tiagochst commented 2 years ago

For a couple of RPPA samples, the set IDs for Set166 is missing

Screen Shot 2022-01-14 at 10 25 39 AM
tiagochst commented 2 years ago

@aastha-v I updated the package to deal with that case, I am not sure why those files are different for some samples.

You can update the package from github with the following R command BiocManager::install("BioinformaticsFMRP/TCGAbiolinks")

aastha-v commented 2 years ago

Thanks a lot! The update is converting missing values to NA's and allowing data access through GDCprepare.