Everything seems to run fine. However, I don't understand why the sample types are listed in pairs.
For SNV, table((getResults(query.snv))$sample_type) gives the following output (in a table):
Metastatic,Blood Derived Normal: 4
Primary Tumor,Blood Derived Normal: 902
Solid Tissue Normal,Metastatic: 2
Solid Tissue Normal,Primary Tumor: 80
And for CNV, table((getResults(query.cnv))$sample_type) yields:
Metastatic,Blood Derived Normal: 4
Primary Tumor,Blood Derived Normal: 991
Solid Tissue Normal,Metastatic: 3
Solid Tissue Normal,Primary Tumor: 86
I can (kind of?) see why this would be the case for SNV, since mutation info comes from tumor-normal aliquot pairs. I've gone onto GDC and downloaded a single .MAF file from a random case, and there seems to be data only for Tumor_Seq_Alleles (notice how the Match_Norm_Seq_Alleles columns are empty). Not sure why this is the case.
Same happens with CNV data. On GDC Data Portal, there's 1 file for each case and it has 2 associated cases: one coming from tumor tissue and another one from normal tissue. However, inside the .TSV file itself there are no references to tumor/normal samples.
What does this mean? Is the mutation data for the tumor tissue, the normal tissue or both?
Hi!
I'm querying all TCGA-BRCA samples for SNV (Simple Nucleotide Variation) and CNV (Copy Number Variation) with the following code:
query.snv <- GDCquery( project = "TCGA-BRCA", data.category = "Simple Nucleotide Variation", experimental.strategy = "WXS", workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking", data.type = "Masked Somatic Mutation", data.format = "MAF" )
query.cnv <- GDCquery( project = "TCGA-BRCA", data.category = "Copy Number Variation", data.type = "Gene Level Copy Number", )
Everything seems to run fine. However, I don't understand why the sample types are listed in pairs.
For SNV,
table((getResults(query.snv))$sample_type)
gives the following output (in a table):And for CNV,
table((getResults(query.cnv))$sample_type)
yields:I can (kind of?) see why this would be the case for SNV, since mutation info comes from tumor-normal aliquot pairs. I've gone onto GDC and downloaded a single .MAF file from a random case, and there seems to be data only for Tumor_Seq_Alleles (notice how the Match_Norm_Seq_Alleles columns are empty). Not sure why this is the case.
Same happens with CNV data. On GDC Data Portal, there's 1 file for each case and it has 2 associated cases: one coming from tumor tissue and another one from normal tissue. However, inside the .TSV file itself there are no references to tumor/normal samples.
What does this mean? Is the mutation data for the tumor tissue, the normal tissue or both?
Many thanks!