Closed Jasonmbg closed 6 years ago
@Jasonmbg I think you should read the doc of oncoplot
function, it says annotationDat
argument is used to provide a custom clinical data, and specify which columns to be drawn using clinicalFeatures
arguments.
Once you provide data to annotationDat
to oncoplot
function, it will not use clinical data
in the MAF object. The problem is that the function can only process one dataframe used to annotate while you provide two. I suggest you merge them into one, like MAF@clinicalData <- merge(MAF@clinicalData, pdat.cluster)
or something else like this and then use the oncoplot
function.
Dear @ShixiangWang , thank you for your quick answer and comment-of course i have read initially the description of the function oncoplot, but because due to my specific goal desribed above, I'm a bit struggling to set this correctly, as also keep only the same patients between the transcriptomic data-pdat.clusters-and the plotted clinical samples of the relative MAF file. Moreover, i tried your suggestion, but still returned an error:
maf.COAD@clinical.data
Tumor_Sample_Barcode
1: TCGA-AA-A010-01A-01D-A17O-10
2: TCGA-CA-6717-01A-11D-1835-10
3: TCGA-AZ-4315-01A-01D-1408-10
4: TCGA-AA-3984-01A-02D-1981-10
5: TCGA-AA-A00N-01A-02D-A17O-10
---
395: TCGA-F4-6704-01A-11D-1835-10
396: TCGA-AA-3972-01A-01W-0995-10
397: TCGA-A6-5664-01A-21D-1835-10
398: TCGA-CA-5255-01A-11D-1835-10
399: TCGA-AZ-4323-01A-21D-1835-10
head(pdat.clusters)
subtype_MSI_status subtype_expression_subtype
TCGA-3L-AA1B-01A-11R-A37K-07 NotAvailable NotAvailable
TCGA-T9-A92H-01A-11R-A37K-07 NotAvailable NotAvailable
TCGA-CA-6716-01A-11R-1839-07 NotAvailable NotAvailable
TCGA-CM-6680-01A-11R-1839-07 NotAvailable NotAvailable
TCGA-D5-6928-01A-11R-1928-07 NotAvailable NotAvailable
TCGA-A6-A565-01A-31R-A28H-07 NotAvailable NotAvailable
subtype_histological_type groupsHC
TCGA-3L-AA1B-01A-11R-A37K-07 NotAvailable EC1
TCGA-T9-A92H-01A-11R-A37K-07 NotAvailable EC1
TCGA-CA-6716-01A-11R-1839-07 NotAvailable EC1
TCGA-CM-6680-01A-11R-1839-07 NotAvailable EC1
TCGA-D5-6928-01A-11R-1928-07 NotAvailable EC1
TCGA-A6-A565-01A-31R-A28H-07 NotAvailable EC1
Tumor_Sample_Barcode
TCGA-3L-AA1B-01A-11R-A37K-07 TCGA-3L-AA1B-01A-11R-A37K-07
TCGA-T9-A92H-01A-11R-A37K-07 TCGA-T9-A92H-01A-11R-A37K-07
TCGA-CA-6716-01A-11R-1839-07 TCGA-CA-6716-01A-11R-1839-07
TCGA-CM-6680-01A-11R-1839-07 TCGA-CM-6680-01A-11R-1839-07
TCGA-D5-6928-01A-11R-1928-07 TCGA-D5-6928-01A-11R-1928-07
TCGA-A6-A565-01A-31R-A28H-07 TCGA-A6-A565-01A-31R-A28H-07
maf.COAD@clinicalData <- merge(maf.COAD@clinicalData, pdat.clusters)
Error in merge(maf.COAD@clinicalData, pdat.clusters) :
no slot of name "clinicalData" for this object of class "MAF"
Hi Jason,
Sorry for late reply. Yes you can do it.. You will have to pass your annotation file (from RNA seq) while reading your maf file. Something like this
#Doanload maf to local file
maf.COAD <- GDCquery_Maf("COAD", pipelines = "muse", save.csv = T, directory = "./")
#pass pdat.clusters as clinical data
coad = read.maf(maf = "70cb1255-ec99-4c08-b482-415f8375be3f/TCGA.COAD.muse.70cb1255-ec99-4c08-b482-415f8375be3f.DR-10.0.somatic.maf",
clinicalData = pdat.clusters #These are your annotations
#Draw
oncoplot(maf = maf.COAD, genes=selected.signature, clinicalFeatures = "groupsHC")
)
If you want to do some enrichment for within groupsHC
groups
coad.ce = clinicalEnrichment(maf = coad.maf, clinicalFeature = "groupsHC")
plotEnrichmentResults(enrich_res = coad.ce, pVal = 0.05)
Let me know if you have any issues. Also please post your sessionInfo
, it would be easier to know if youre using latest version.
P.S: Thanks @ShixiangWang , you were almost close.
@PoisonAlien Great!
@Jasonmbg I just not use real code..
Maybe you can use the following code
maf.COAD@clinical.data = dplyr::full_join(maf.COAD@clinical.data, pdat.clusters, by="Tumor_Sample_Barcode") %>% data.table::data.table()
to merge the feature data.
I used maf built in maftools to test the code.
Dear @PoisonAlien ,
thank you for your suggestions and advice-however, when i runned your code chunk:
maf.COAD <- GDCquery_Maf("COAD", pipelines = "muse", save.csv = T, directory = "./")
File created: .//TCGA.COAD.muse.70cb1255-ec99-4c08-b482-415f8375be3f.DR-10.0.somatic.maf.csv
Warning message:
Unknown or uninitialised column: 'sample'.
coad = read.maf(maf = "TCGA.COAD.muse.70cb1255-ec99-4c08-b482-415f8375be3f.DR-10.0.somatic.maf", clinicalData = pdat.clusters)
reading maf..
Error in data.table::fread(input = maf, sep = "\t", stringsAsFactors = FALSE, :
File 'TCGA.COAD.muse.70cb1255-ec99-4c08-b482-415f8375be3f.DR-10.0.somatic.maf' does not exist; getwd()=='C:/Users/stathis/Desktop/COAD.Mutations'. Include correct full path, or one or more spaces to consider the input a system command.
So, what do you think of this error ? something with windows ?
getwd()
[1] "C:/Users/stathis/Desktop/COAD.Mutations"
list.files()
[1] "pdata.COAD.clusters.reordered.txt"
[2] "TCGA-COAD"
[3] "TCGA.COAD.muse.70cb1255-ec99-4c08-b482-415f8375be3f.DR-10.0.somatic.maf.csv"
@ShixiangWang thank you for your time-iit still returned an error
Error in dplyr::full_join(maf.COAD@clinical.data, pdat.clusters, by = "Tumor_Sample_Barcode") :
trying to get slot "clinical.data" from an object (class "tbl_df") that is not an S4 object
Hi, point to the downloaded file. I guess its inside "TCGA-COAD" directory ?
Dear PoisonAlien,
the TCGA-COAD directory has:
(TCGA-COAD>harmonized>Simple_Nucleotide_Variation>Masked_Somatic_Mutation>70cb1255-ec99-4c08-b482-415f8375be3f>TCGA.COAD.muse.70cb1255-ec99-4c08-b482-415f8375be3f.DR-10.0.somatic.maf)
but the relative file is in WinRAR mode-
and also, there is directly a relative csv without entering the above file, with name:
"TCGA.COAD.muse.70cb1255-ec99-4c08-b482-415f8375be3f.DR-10.0.somatic.maf.csv"
like i printed above:
list.files()
[1] "pdata.COAD.clusters.reordered.txt"
[2] "TCGA-COAD"
[3] "TCGA.COAD.muse.70cb1255-ec99-4c08-b482-415f8375be3f.DR-10.0.somatic.maf.csv"
Do you get this error for both the files ? (the one inside COAD-COAD directory and the csv file) Becaus your error says file not found, maybe you entered wrong file name ?
coad = read.maf(maf = "TCGA-COAD>harmonized>Simple_Nucleotide_Variation>Masked_Somatic_Mutation>70cb1255-ec99-4c08-b482-415f8375be3f>TCGA.COAD.muse.70cb1255-ec99-4c08-b482-415f8375be3f.DR-10.0.somatic.maf")
#or
coad = read.maf(maf = "TCGA.COAD.muse.70cb1255-ec99-4c08-b482-415f8375be3f.DR-10.0.somatic.maf.csv")
Make sure the file names and paths are correct.
@PoisonAlien , i tried some modifications with setting current paths with Rstudio-perhaps a bug of Rstudio-rerun again with the relative .csv file, but still now another error appeared:
coad = read.maf(maf = "TCGA.COAD.muse.70cb1255-ec99-4c08-b482-415f8375be3f.DR-10.0.somatic.maf.csv", clinicalData = pdat.clusters)
reading maf..
Error in validateMaf(maf = maf, isTCGA = isTCGA, rdup = removeDuplicatedVariants, :
missing required fields from MAF: Chromosome
Omg. so many errors !
Can you give it a last try, and read the compressed file within the TCGA-COAD
directory..
If you still get an error, can you share your MAF file and pdata file ? Its hard to guess the issue without raw/reproducible file.. You can email me at anandmt3@gmail.com if you don't want to attach it here.
Dear @PoisonAlien ,unfortunately i will contact you on email because various things continue to appear-
Dear Anand,
i would like to ask you some important questions regarding your great R package maftools, and especially regarding the function oncoplot. I tried through the vignette to proceed, but without success.
In detail, through the R package TCGAbiolinks i have used a small gene signature (12 genes) with the COAD TCGA RNA-Seq dataset, that i used with consensus clustering, and showed some interesting grouped survival patterns-
Now, my collaborators asked me if i can found any mutational patterns identified for the 12 genes, and if feasible plot them with the same samples that resulted from the clustering analysis, with the exact cluster membership (groups HC column below)-
Thus, my questions and issues are the following:
A) My process for downloading mutational data:
So, my issue that clearly my above data frame from the RNA-Seq transcriptomic dataset, includes more patients (456) than the clinical mutation data(399).
Thus, my goal is-if feasible-to create with the oncoplot for the same patients in both types of data, and plot these with the selected genes, as also annotate with the groupsHC column from above, even if some data are missing.
On this premise, i naively tried:
So, in your opinion how this could fixed ?
Thank you in advance,
Efstathios-Iason