Closed AnandamidaCBD closed 1 year ago
I am sorry for my horrible aesthetic :/ It's my first issue on github
Hi, No worries and thanks for the issue. I am bit swamped with work today, I will try to look into it as soon as I can.
Hi,
Have you been able to solve the issue?
Also to note that vc_nonSyn
argument is to specify variants with consequences. Your list of variant classifications - vc_nonSyn
- contains silent variants as well. Essentially you are considering everything as variants with functional consequences.
I modified and added this line to my code, and then I ran again my pre-designed maftool functions:
xX_data <- xX@data %>%
select(-Source_MAF) %>%
add_count(Hugo_Symbol, Transcript_ID, Tumor_Sample_Barcode, Chromosome, Start_Position, Variant_Classification)
I had this error before:
Error in as_tibble()
:
! Column name Source_MAF
must not be duplicated.
Use .name_repair
to specify repair.
Caused by error in repaired_names()
:
! Names must be unique.
✖ These names are duplicated:
In summary, the line of code takes the data table from the MAF object xX, removes the "Source_MAF" column, and then adds a new column "n" representing the count of rows for each unique combination of Hugo_Symbol, Transcript_ID, Tumor_Sample_Barcode, Chromosome, Start_Position, and Variant_Classification
Taking advantage of this opportunity, I would like to ask you about the "maf.silent" table.
I am a little bit confused about the difference between the "maf.silent" table and the silent variants I may find in the "Variant.classification" table when I specify them with "vc_nonSyn".
What I mean is, when I use "read.maf(vc_nonSyn = c(silent, etc))" and obtain, for instance, 95k silent variants, I then realized that the number of silent variants shown in the "Variant.Classification" table is significantly lower. However, the total number of variants I got in the "Data" table is considerably higher when I used "vc_nonSyn = c(silent, etc))", as I would expect.
I would like to thank you for your help and patience in advance!
Hi,
I am having hard time following the issue. I will still try to answer..
Source_MAF" at locations 133 and 134
- it seems the header line is duplicated. How did you concatenate your mafs (I guess cat
command from terminal?Variant Classifications
as non-synonymous with vc_nonSyn
argument. Rest of the mutations will be considered as silent. You are including everything as non synonymous variants with your vc_nonSyn
definition (for example Silent
in your list). I would recommend just to use read.maf
in default mode and not define your own list of vc_nonSyn
.Apologies for not providing a clear explanation earlier. I managed to resolve the issue I was facing. Initially, I was attempting to extract tables from a MAF object, and after incorporating some new code, I was able to extract the IDs as data frames, which I could then write and save.
One of my main objectives was to obtain silent variants and store them in the MAF file to have a larger number of variants for downstream analysis. I would like to take this opportunity to ask about the MAF.silent table.
MAF_silent <- list.files(path = "~/Borja/data_2/MIBC_WES_data", pattern = "MIBC")
MAF <- list.files(path = "~/Borja/data_2/MIBC_WES_data", pattern = "MIBC")
MAF_silent <- read.maf(maf= MAF_silent, vc_nonSyn = c("Frame_Shift_Del", "Frame_Shift_Ins", "Splice_Site", "Translation_Start_Site","Nonsense_Mutation",
"Nonstop_Mutation", "In_Frame_Del","In_Frame_Ins", "Missense_Mutation","Silent","Start_Codon_Del", "Stop_Codon_Ins",
"3'UTR", "5'UTR", "mature_miRNA_variant", "exon_variant", "non_coding_exon_variant",
"non_coding_transcript_exon_variant", "non_coding_transcript_variant",
"nc_transcript_variant","Amp","Del"))
> sum(MAF_silent@variant.classification.summary$Silent)
[1] 63792
> sum(MAF_silent@variant.classification.summary$total)
[1] 247231
MAF <- read.maf(MAF)
>
-Reading
-Validating
--Removed 63792 duplicated variants
-Silent variants: 74723
-Summarizing
--Possible FLAGS among top ten genes:
TTN
MUC16
SYNE1
HMCN1
-Processing clinical data
--Missing clinical data
-Finished in 20.2s elapsed (27.9s cpu)
> sum(MAF@variant.classification.summary$total)
[1] 172508
When I read the MAF data without considering silent variants and defining nonSyn argument, I noticed discrepancies in the numbers. The total number of variants in MAF_silent increased, as I expected (247,231 vs. 172,508). However, I am puzzled by the fact that there are 63,792 silent variants in MAF_silent (Variant.Classification), whereas a larger number of variants (74,723) are shown in the same MAF without defining the nonSyn argument.
I would greatly appreciate any insights or explanations for this discrepancy. Thank you in advance for your time and assistance!
This issue is stale because it has been open for 60 days with no activity.
Describe the issue First of all, I wanna thanks the developers for such a marvellous tool. I need to point that I am really naive in bioinformatics, so I will be really sorry if I am making silly mistakes.
I did generate merged MAFs (with silent variants) with a shared pipeline. As I saw, you just got to specify a variants vector in the argument _vcnonSyn of read-maf function. I had multiple MAF files (from different VC) per sample, so I generate a consensus MAF for this dataset. We owned this raw data and I succeeded creating CONSENSUS with and without silent variants (file sizes are different).
However, here begins my issue. We received a really large consensus MAF from a different dataset, but I do not have the raw data. I tried to run this command but the output silent MAF file has the same size as the original "large consensus MAF" (and the downstream signature analysis exhibit the same results for both MAF):
Session info Run
sessionInfo()
and post the output below