Closed priti88 closed 6 years ago
Hi @priti88, can you identify the bug in the importer? Afaik it is quite common to find patients with both 01 and 06 samples.
@pieterlukasse I am not sure. It might not be an importer bug. @angelicaochoa is looking into this further.
Hi @pieterlukasse , @sheridancbio and I have been investigating why we're seeing this sample TCGA-HR-A2OG-01
when the GDC portal only reports TCGA-HR-A2OG-06
for patient TCGA-HR-A2OG
.
All genomic data for patient TCGA-HR-A2OG
is linked to sample id TCGA-HR-A2OG-06
except for CNA and mutations. I did a quick survey of the data we received from Firehose and found that the mutations are in fact linked to sample TCGA-HR-A2OG-01
, which is why we see two samples for patient TCGA-HR-A2OG
.
@sheridancbio may have more to update soon.
Hello angelicaochoa
Thank you for explain me what is happening.
1.- If I understood correctly from this patient the CNA and mutations are from TCGA-HR-A2OG-01 (thats mean a primary melanoma because in other samples the -01 code means primary tumor), meanwhile the other data like RNAseq and protein expression data are from TCGA-HR-A2OG-06 (thats mean a metastatic melanoma sample because in other samples the code -06 means metastasis ).
So thats mean this patient have two samples.
But if you enter in the pathologist report seems to be the same sample (metastatic report in the primary tumor).
2.-I saw similar cases in melanoma database from TCGA using cBIO portal.
The patients with two samples are:
TCGA-D3-A1Q6 TCGA-D3-A1QA TCGA-D9-A1X3 TCGA-ER-A19T TCGA-ER-A2NF TCGA-HR-A2OG TCGA-HR-A2OH TCGA-XV-AB01
I would like to know how you did it, in order to face this problem in the future with other databases
3.- In the clinical database when you download the information this patient sample TCGA-HR-A2OG-01 is not labeled as a primary sample (. In patients with two samples we have the same situation.
Is it possible to correct this?
Thank you very much for your time
The reason the identifiers are different is because there was a sample mix-up discovered last year. Because the cancer sample was originally thought to be a primary cancer, but is actually a metastasis, the 01 code was changed to a 06 code. The problem arises because cBioPortal depends on the old datasets before TCGA Data Portal closed down and reappeared as Genomic Data Commons. Note that GDC also has the updated barcodes for these patients with noted discrepancies.
TCGA also discourages the identification of samples by barcodes and encourages the use of UUID, which doesn't change if sample mix-ups are identified after the sample barcode has been publicly issued and become widely used.
Thank you DarioS for your advice. I have another doubt, if you see up in angelicaocoha comment :
"...All genomic data for patient TCGA-HR-A2OG is linked to sample id TCGA-HR-A2OG-06 except for CNA and mutations. I did a quick survey of the data we received from Firehose and found that the mutations are in fact linked to sample TCGA-HR-A2OG-01, which is why we see two samples for patient TCGA-HR-A2OG."
My question is :
Could you tell me in these cases (in melanoma TCGA database) if the studies (CNA, Mutation, RNAseq and protein expression) are from primary and the metastatic sample (like the way she did it before) using cBIO portal? (I'm clinician and is very easy to extract the info in using this)
The patients with two samples are:
TCGA-D3-A1Q6 TCGA-D3-A1QA TCGA-D9-A1X3 TCGA-ER-A19T TCGA-ER-A2NF TCGA-HR-A2OG TCGA-HR-A2OH TCGA-XV-AB01
Hi @aindacochea,
Through the GDC portal site (https://portal.gdc.cancer.gov/projects/TCGA-SKCM) I was able to map the following:
TCGA-D3-A1Q6
TCGA-D3-A1QA
TCGA-D9-A1X3
TCGA-ER-A19T
TCGA-ER-A2NF
TCGA-HR-A2OG
TCGA-HR-A2OH
TCGA-XV-AB01
Thank you very much Angelica
Reminding your previous answer in the case of that patient he have 2 samples and each sample had different molecular analysis (one sample had mutation profile, meanwhile the other had other genomic test, RNA expression and protein expression)
"All genomic data for patient TCGA-HR-A2OG is linked to sample id TCGA-HR-A2OG-06 except for CNA and mutations. I did a quick survey of the data we received from Firehose and found that the mutations are in fact linked to sample TCGA-HR-A2OG-01, which is why we see two samples for patient TCGA-HR-A2OG."
Could you tell me in patients with 2 samples(in melanoma TCGA database) if the studies (CNA, Mutation, RNAseq and protein expression) are from primary and the metastatic sample (like the way you did it before in the patient TCGA-HR-A2OG )
Thank you
Alberto
2017-11-07 17:04 GMT+01:00 angelicaochoa notifications@github.com:
Hi @aindacochea https://github.com/aindacochea,
Through the GDC portal site I was able to map the following:
TCGA-D3-A1Q6
- TCGA-HR-A2OG-06 (Met)
TCGA-D3-A1QA
- TCGA-D3-A1QA-07 (Met)
- TCGA-D3-A1QA-06 (Met)
TCGA-D9-A1X3
- TCGA-D9-A1X3-06 (Met)
TCGA-ER-A19T
- TCGA-ER-A19T-01 (Primary)
- TCGA-ER-A19T-06 (Met)
TCGA-ER-A2NF
- TCGA-ER-A2NF-01 (Primary)
- TCGA-ER-A2NF-06 (Met)
TCGA-HR-A2OG
- TCGA-HR-A2OG-06 (Met)
TCGA-HR-A2OH
- TCGA-HR-A2OH-06 (Met)
TCGA-XV-AB01
- TCGA-XV-AB01-06 (Met)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cBioPortal/cbioportal/issues/3069#issuecomment-342530754, or mute the thread https://github.com/notifications/unsubscribe-auth/AeespmGkccmK4iBwwpF9rs1YRK50HcaKks5s0H9zgaJpZM4PNOZz .
@aindacochea As @DarioS mentioned, there was a sample mix up and in the case of TCGA-HR-A2OG
. For example, the sample identifier was changed from -01 to -06, so the data for both -01 and -06 belong to the same sample identifier (-06).
I believe this to be the same case for the patients I listed above containing only a single sample identifier. I cross referenced this information with what was available in the GDC portal TCGA-SKCM project.
For the following cases I would resolve all sample identifiers to the sample identifier provided in this list:
TCGA-D3-A1Q6
TCGA-D9-A1X3
TCGA-HR-A2OG
TCGA-HR-A2OH
TCGA-XV-AB01
The other patients listed in my previous comment do in fact have more than one sample. These ones however can be resolved to the single sample identifier provided here. Example, for patient TCGA-D9-A1X3
resolve all sample identifiers to TCGA-D9-A1X3-06
.
Error in the importer is creating duplicated sample with different IDs. Bug reported by user: https://groups.google.com/forum/#!topic/cbioportal/dmMG9QrNDCw