BioinformaticsFMRP / TCGAbiolinks

287 stars 109 forks source link

number of sample using TCGAbiolinks is more than GDC portal. #282

Closed ghost closed 5 years ago

ghost commented 5 years ago

Hello, I downloaded "TCGA-BRCA" methylation data of 'Illumina Human Methylation 450'.

When I checked the number of files and that of case in data downloaded from TCGAbiolinks, It was 895 and 791 each. By the way, I also check them at GDC portal website and It was 892 and 789 each as below. image image

I wonder the reason why the number of them is not same. If you know the reason, please let me know..


tiagochst commented 5 years ago

Please, did you download the data from the legacy data portal or the harmonized database ?

The legacy has 895 files.

screen shot 2018-12-10 at 3 23 06 pm

The harmonized database has less samples. There might be several reasons why it has less samples (some samples did not pass QC thresholds within the new pipelines)

screen shot 2018-12-10 at 3 24 29 pm
ghost commented 5 years ago

Yes I downloaded legacy data. I understand and thank you for your reply!