BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
289 stars 110 forks source link

discrepancy in number of samples #458

Open Xiaofei-git opened 3 years ago

Xiaofei-git commented 3 years ago

Dear community,

I tried to download data by using "TCGAquery_recount2". But, I found the number of sample is different using different functions in TCGAbiolinks. Why does this happen? Does "TCGAquery_recount2" is download different version of the data? (I originally post the issue here https://support.bioconductor.org/p/9136385/#9136498) Thanks a lot!

If I used TCGAquery_recount2, the number of samples is 601 (542 Tumor and 58 Normal) for TCGA-LAUD. While it is 594 (535 T and 59 N) for TCGA-LUAD if I used "GDCquery", "GDCdownload", and "GDCprepare". The common samples are 594, and there are 7 more tumor samples using TCGAquery_recount2.

If I used TCGAquery_recount2 to download the GTEs data for Lung tissue, the number of samples is 374. But it is 419 from the GTEx website query. The common samples are 313 between these 2 ways.