BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
289 stars 110 forks source link

Mislabeled sample ID for ASCAT data #523

Closed ShixiangWang closed 2 years ago

ShixiangWang commented 2 years ago

Hi,

Recently, I found my downloaded data had problems. I thought the ASCAT data was from only blood samples (germline CNV) but found out that there were mislabeled sample IDs.

This is an example.

https://portal.gdc.cancer.gov/files/a1aab1b6-4b75-41fe-90aa-195670aee2cb A T-N paired samples used for generating the data, typically we use tumor ID to mark it, i.e. TCGA-VM-A8CH-01A-12D-A36N-01. However, data.frame got by ASCAT are labeled with normal sample ID. I downloaded all TCGA ASCAT result, it turns out half of it have sample ID from normal control.

image

tiagochst commented 2 years ago

@ShixiangWang Which version of the package are you using?

Could you provide the query for the specific cases.

I made some changes last week to handle mixed samples and it will output both paired tumor-normal ids.

It is quite complicated to make TCGAbiolinks universal for all projects, there is always an exception.

You can find the latest version example here: https://rpubs.com/tiagochst/issue_523

Would that solve the issue ?

ShixiangWang commented 2 years ago

@tiagochst Thanks. I will take a try with latest version and see how it works.

ShixiangWang commented 2 years ago

@tiagochst I checked your Rpub doc, it should work now. Thanks for your illustration.