Closed grantn5 closed 4 months ago
Hi,
Thanks for the issue. Good point indeed. I will make the changes.
Hi,
Thanks again for the issue. I have pushed a fix for handling NAs
. You can just add three more columns to your CNV_df
- Chromosome
Start_Position
and End_Position
and it should work. Also, I have added keepNA
argument to subsetMaf
that should either remove or keep rows with NAs post-sub-setting for ranges.
Example with the data from vignette
#path to TCGA LAML MAF file
laml.maf = system.file('extdata', 'tcga_laml.maf.gz', package = 'maftools')
#clinical information containing survival information and histology. This is optional
laml.clin = system.file('extdata', 'tcga_laml_annot.tsv', package = 'maftools')
laml = read.maf(maf = laml.maf,
clinicalData = laml.clin,
verbose = FALSE)
set.seed(seed = 1024)
barcodes = as.character(getSampleSummary(x = laml)[,Tumor_Sample_Barcode])
#Random 20 samples
dummy.samples = sample(x = barcodes,
size = 20,
replace = FALSE)
#Genarate random CN status for above samples
cn.status = sample(
x = c('ShallowAmp', 'DeepDel', 'Del', 'Amp'),
size = length(dummy.samples),
replace = TRUE
)
custom.cn.data = data.frame(
Gene = "DNMT3A",
Sample_name = dummy.samples,
CN = cn.status,
stringsAsFactors = FALSE
)
#Adding start and end position to cn data
custom.cn.data$Start_Position = 25450743
custom.cn.data$End_Position = 25565459
head(custom.cn.data)
Gene Sample_name CN Start_Position End_Position
1 DNMT3A TCGA-AB-2898 ShallowAmp 25450743 25565459
2 DNMT3A TCGA-AB-2879 Del 25450743 25565459
3 DNMT3A TCGA-AB-2920 Amp 25450743 25565459
4 DNMT3A TCGA-AB-2866 Del 25450743 25565459
5 DNMT3A TCGA-AB-2892 Del 25450743 25565459
6 DNMT3A TCGA-AB-2863 ShallowAmp 25450743 25565459
# MAF with cndata including start and end position
laml.plus.cn.withLoci = read.maf(maf = laml.maf,
cnTable = custom.cn.data,
verbose = FALSE)
# MAF with cndata minus the start and end position
laml.plus.cn.noLoci = read.maf(maf = laml.maf,
cnTable = custom.cn.data[,c("Gene", "Sample_name", "CN")],
verbose = FALSE)
#Subset for ranges
maftools::subsetMaf(maf = laml.plus.cn.withLoci, ranges = data.frame(chromosome = 2, start = 25450743, end = 25565459))
54 variants within provided ranges
-Processing clinical data
An object of class MAF
ID summary Mean Median
<char> <char> <num> <num>
1: NCBI_Build 37 NA NA
2: Center genome.wustl.edu NA NA
3: Samples 48 NA NA
4: nGenes 1 NA NA
5: Frame_Shift_Del 4 0.083 0
6: Missense_Mutation 39 0.812 1
7: Nonsense_Mutation 5 0.104 0
8: Splice_Site 6 0.125 0
9: total 54 1.125 1
#When loci info not available, it throws a warning.
maftools::subsetMaf(maf = laml.plus.cn.noLoci, ranges = data.frame(chromosome = 2, start = 25450743, end = 25565459))
54 variants within provided ranges
-Processing clinical data
An object of class MAF
ID summary Mean Median
<char> <char> <num> <num>
1: NCBI_Build 37 NA NA
2: Center genome.wustl.edu NA NA
3: Samples 48 NA NA
4: nGenes 1 NA NA
5: Frame_Shift_Del 4 0.083 0
6: Missense_Mutation 39 0.812 1
7: Nonsense_Mutation 5 0.104 0
8: Splice_Site 6 0.125 0
9: total 54 1.125 1
Warning message:
In maftools::subsetMaf(maf = laml.plus.cn.noLoci, ranges = data.frame(chromosome = 2, :
Removed 20 rows with no loci info.
#Keep variants with missing loci
maftools::subsetMaf(maf = laml.plus.cn.noLoci, ranges = data.frame(chromosome = 2, start = 25450743, end = 25565459), keepNA = TRUE)
54 variants within provided ranges
-Processing clinical data
An object of class MAF
ID summary Mean Median
<char> <char> <num> <num>
1: NCBI_Build 37 NA NA
2: Center genome.wustl.edu NA NA
3: Samples 64 NA NA
4: nGenes 1 NA NA
5: DeepDel 4 0.062500 0
6: Frame_Shift_Del 4 0.062500 0
7: Missense_Mutation 39 0.609375 1
8: Nonsense_Mutation 5 0.078125 0
9: ShallowAmp 6 0.093750 0
10: Splice_Site 6 0.093750 0
11: total 64 1.000000 1
12: Amp 4 0.062500 0
13: Del 6 0.093750 0
14: CNV_total 10 0.156250 0
Warning message:
In maftools::subsetMaf(maf = laml.plus.cn.noLoci, ranges = data.frame(chromosome = 2, :
Added back 20 rows with no loci info.
You will have to install from GitHub for the changes. Please let me know if this works for you.
Hi, that is amazing, thank you so much for sorting so quickly.
I will install from Git Hub and let you know if I run into any issues!
Hi @PoisonAlien Just following up on this I ran into no issues using the command however the function documentation and package wiki is now out of date and needs to be updated.
Hi, Thank you for testing. I have updated the package documentation and vignette, it has not been pushed to Bioconductor yet. I will close the issue - please feel free to reopen if needed.
Describe the issue If you add a
cnTable
to a maftools object in the read.maf() function, the resulting object will contain NAs inStart_Position
andEnd_Position
columns for genes that do not have any mutations, they are simply annotated as AMP or DEL in the @data part of the maf object. this means when trying to subset maf based on a range you get an error. Is there a solution to avoid this by adding in the start and end of the gene?Command Please post your commands and the output (errors or any unexpected output)
I think it would be good to add functionality to the cnTable argument so you can pass chromosome, Start_Postion and End_postion in the cnTable to avoid this error.
Session info Run
sessionInfo()
and post the output below