BioinformaticsFMRP / TCGAbiolinks

289 stars 110 forks source link

Downloading Mutation data (hg19) for a cancer type #372

Open beginner984 opened 4 years ago

beginner984 commented 4 years ago


I am trying to get Mutation data (hg19) for ESCA cancer in mafformat; I have done so but I am getting error Can you help me please

> query.maf.hg19 <- GDCquery(project = "TCGA-ESCA", 
+                            data.category = "Simple nucleotide variation", 
+                            data.type = "Simple somatic mutation",
+                            access = "open", 
+                            legacy = TRUE)
o GDCquery: Searching in GDC database
Genome of reference: hg19
oo Accessing GDC. This might take a while...
ooo Project: TCGA-ESCA
oo Filtering results
ooo By access
ooo By data.type
oo Checking data
ooo Check if there are duplicated cases
ooo Check if there results for the query
o Preparing output
> View(query.maf.hg19[[1]][[1]])
> query.maf.hg19 <- GDCquery(project = "TCGA-ESCA", 
+                            data.category = "Simple nucleotide variation", 
+                            data.type = "Simple somatic mutation",
+                            access = "open", 
+                            file.type = "bcgsc.ca_ESCA.IlluminaHiSeq_DNASeq.1.somatic.maf  ",
+                            legacy = TRUE)
o GDCquery: Searching in GDC database
Genome of reference: hg19
oo Accessing GDC. This might take a while...
ooo Project: TCGA-ESCA
oo Filtering results
ooo By access
ooo By data.type
ooo By file.type

|Files                                                                  |
|bcgsc.ca_ESCA.IlluminaHiSeq_DNASeq.1.somatic.maf                       |
|genome.wustl.edu_ESCA.IlluminaHiSeq_DNASeq_automated.1.1.0.somatic.maf |
|gsc_ESCA_pairs.aggregated.capture.tcga.uuid.automated.somatic.maf      |
|hgsc.bcm.edu_ESCA.IlluminaGA_DNASeq.1.somatic.maf                      |
|ucsc.edu_ESCA.IlluminaGA_DNASeq_automated.Level_2.1.0.0.somatic.maf    |
|NA                                                                     |
|NA                                                                     |
|NA                                                                     |
|NA                                                                     |
|NA                                                                     |
Error in GDCquery(project = "TCGA-ESCA", data.category = "Simple nucleotide variation",  : 
  We were not able to filter using this file type. Examples of available files are above. Please check the vignette for possible entries
tiagochst commented 4 years ago

I'll check your code soon. But maybe you want to check this for hg19 mutations.

tiagochst commented 4 years ago


> query.maf.hg19 <- GDCquery(project = "TCGA-ESCA", 
+                            data.category = "Simple nucleotide variation", 
+                            data.type = "Simple somatic mutation",
+                            access = "open", 
+                            file.type = "bcgsc.ca_ESCA.IlluminaHiSeq_DNASeq.1.somatic.maf  ",
+                            legacy = TRUE)

Your file.type has empty characters in the end, if you remove them it should work.

query.maf.hg19 <- GDCquery(project = "TCGA-ESCA", 
                           data.category = "Simple nucleotide variation", 
                           data.type = "Simple somatic mutation",
                           access = "open", 
                           file.type = "bcgsc.ca_ESCA.IlluminaHiSeq_DNASeq.1.somatic.maf",
                           legacy = TRUE)
beginner984 commented 4 years ago

Sorry what you mean by file.type has empty characters in the end ?

Thank you

tiagochst commented 4 years ago

Screenshot from 2019-12-09 12-00-25