Closed 14109022 closed 12 months ago
Hi,
You should use the sample information (first 16 characters in the TCGA barcode) to match the information. And example is shown below. Just check what are the types of mutation you want to consider.
library(TCGAbiolinks)
query <- GDCquery(
project = "TCGA-LUAD",
data.category = "Simple Nucleotide Variation",
access = "open",
data.type = "Masked Somatic Mutation",
workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking"
)
GDCdownload(query)
maf <- GDCprepare(query)
mutations_tp53 <- maf |> dplyr::filter(Hugo_Symbol == "TP53")
table(mutations_tp53$VARIANT_CLASS)
table(mutations_tp53$IMPACT)
maf$Tumor_Sample_Barcode
query <- GDCquery(
project = "TCGA-LUAD",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "STAR - Counts"
)
GDCdownload(query = query,files.per.chunk = 30)
data <- GDCprepare(query = query)
# Add mutation information to SE
data$mutation_tp53 <- data$sample %in% substr(mutations_tp53$Tumor_Sample_Barcode,1,16)
table(data$mutation_tp53)
Hi,
You should use the sample information (first 16 characters in the TCGA barcode) to match the information. And example is shown below. Just check what are the types of mutation you want to consider.
library(TCGAbiolinks) query <- GDCquery( project = "TCGA-LUAD", data.category = "Simple Nucleotide Variation", access = "open", data.type = "Masked Somatic Mutation", workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking" ) GDCdownload(query) maf <- GDCprepare(query) mutations_tp53 <- maf |> dplyr::filter(Hugo_Symbol == "TP53") table(mutations_tp53$VARIANT_CLASS) table(mutations_tp53$IMPACT) maf$Tumor_Sample_Barcode query <- GDCquery( project = "TCGA-LUAD", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts" ) GDCdownload(query = query,files.per.chunk = 30) data <- GDCprepare(query = query) # Add mutation information to SE data$mutation_tp53 <- data$sample %in% substr(mutations_tp53$Tumor_Sample_Barcode,1,16) table(data$mutation_tp53)
This way makes much more sense. Thank you so much, I really really appreciate it!
Hi,
I'm really new to RNA-seq/Bioinformatics/TCGAbiolinks so I do apologise if this seems like a silly question! I am trying to analyse gene expression data of the TCGA-LUAD project, and more specifically trying to see gene expression differences in WT-TP53 and Mutant-TP53 TCGA-LUAD patients.
My initial approach was to get the Transcriptomic Profiling RNA-seq data and also the Simple Nucleotide Variation data and use one of the ID's to match up patients with and without TP53 mutations. However, I have not been able to find any common identifiers between the objects.
Is there an alternative method for what I am trying to do, or is this not possible at all?
Thank you so much in advance