Closed Lukas67 closed 2 years ago
Hi Lukas,
read_vcfs_as_granges
tries to change the sequence names using the sequence names of the reference genome (This is done with the seqlevelsStyle
function). Since this doesn't seem to be working in this case, you can set change_seqnames
to FALSE
. This argument is a boolean, so it won't use sequence names from "wuhCor1".
I suspect the second error is indeed related to the first. What seqlevels does your grl have now? Are there any samples with no mutations? Do you also get this error when you try to run a (part of) a single sample? If this also doesn't work then you could maybe send a small part of a vcf so I can check what goes wrong.
Hi FreekManders,
thanks for your quick reply.
changed the read_vcfs_as_granges accordingly.
What seqlevels does your grl have now?: These are the seqlevels grl = vcfs, SarsCoV2 = BSgenome object
seqlevels(grl) [1] "MN908947.3" seqlevels(SarsCoV2) [1] "NC_045512v2"
From my understanding the difference in the seqlevels causes the error below.
context <- mut_context(grl[[1]], ref_genome) Error: The input GRanges and the ref_genome share no seqnames (chromosome names). Do they use the same seqlevelsStyle? An example of how to fix this is show below: You can change the seqlevelStyle with: `seqlevelsStyle(grl) = 'UCSC'
So I changed it to the same seqlevel by:
seqlevels(grl) <- seqlevels(SarsCoV2)
Now the context function runs without any problem.
Are there any samples with no mutations? Since the samples are all covid lineages ranging from delta to omicron the absence of mutations is very unlikely.
Do you also get this error when you try to run a (part of) a single sample? This is exactly what I did with below function. I iterated through the GRangesList and returned it to a list of dataframes. With lapply I iterated again through the dataframe and created a list of plots with all lineages.
mut_occurence_df<- lapply(grl, mut_type_occurrences, ref_genome) mut_occurence_df <- lapply(mut_occurence_df, function(x){row.names(x) <- x[,1]; x}) plot_list<- list(lapply(mut_occurence_df, function(x){plot_spectrum(x, CT = TRUE, indv_points = TRUE, legend = TRUE)}))
Attached you can find a small example of my .vcf file. (I needed to change the file extension to .txt)
Thank you very much for your help.
BR Lukas
Hi Lukas,
I couldn't reproduce the issue. I don't have the SarsCoV2 BSgenome object, so I changed the sequence name of the grl to "chr1" and tried to use it with the human BSgenome object, which didn't give any errors. So there may be an issue with the BSgenome object you created.
Does the traceback of the error show where in the mut_type_occurences
function this error occurs?
Additionally, I noticed your data contains both SNVs, indels, and MBSs. You should split those like this: grl_snv = get_mut_type(grl, "snv", predefined_dbs_mbs = T)
.
Hi FreekMander,
thank you for effort.
I also think that I did something wrong at forging the reference genome. Attached you can find the seed file (extension changed from .seed to .txt), the sequence file is wuhCor1.2bit
Unfortunately there is only the short traceback from above in the R-studio console.
type_occurrences <- mut_type_occurrences(grl_snv, ref_genome) Error in
levels<-
(*tmp*
, value = as.character(levels)) : factor level [11] is duplicated
I also tried it with split mutations.
BR Lukas
I made a SarsCoV2 BSgenome package using the seed file you send and it works fine with your example vcf. mut_type_occurrences(grl, SarsCoV2)
, doesn't give any errors (after seqlevels(grl) <- seqlevels(SarsCoV2)
has been run).
Did you set the ref_genome
variable to be the SarsCoV2 BSgenome package? If not then this might cause an issue?
Does the package work for you on the example data in the vignette?
One thing you could try is to run the mut_type_occurrences
function line-by-line. If you type the function name without brackets behind it and then press enter, R should show you the code. You can then copy the code to a script and run each line of the function separately to determine where the error occurs. Any internal functions that MutationalPatterns uses can be found like this: MutationalPatterns:::type_context
.
Thank you for help. I found the problem for:
type_occurrences <- mut_type_occurrences(grl, ref_genome) Error in levels<-(*tmp*, value = as.character(levels)) : factor level [11] is duplicated
The issue was, that one lineage was duplicated (I had more than one sample for the lineage e.g. AY.43)
With assigning:
seqlevels(grl) <- seqlevels(SarsCoV2) seqlevelsStyle(grl) <- seqlevelsStyle(SarsCoV2)
the program runs flawless.
Thanks for the tip by calling a function without brackets. I will keep this in mind for the next circumstance :)
Hello,
I would like to analyse mutational signatures across covid lineages with the pipeline (https://bioconductor.org/packages/release/bioc/vignettes/MutationalPatterns/inst/doc/Introduction_to_MutationalPatterns.html). After forging the UCSC .2bit file to a BSgenome.object I tried to read vcfs from our sequencing pipeline. However read_vcfs_as_granges returns the following error:
found no sequence renaming map compatible with seqname style "UCSC" for this object Error: The seqlevelStyle of the vcf could not be changed to that of the reference. You can run this function with
change_seqnames = F
andgroup = 'none'
, to prevent this error. However, you then have to make sure that the seqnames (chromosome names) are the same between your vcfs and the reference BSgenome object. (The message of the internal error causing this problem is shown above.)After investigating I fixed this error with the following workaround: grl <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome, change_seqnames = "wuhCor1", group = "none") and seqlevels(grl) <- seqlevels(SarsCoV2)
So I continued the pipeline and came to: type_occurrences <- mut_type_occurrences(grl, ref_genome) where I got the error Error in
levels<-
(*tmp*
, value = as.character(levels)) : factor level [11] is duplicatedOn which factor level the program is referring and how is it possible to change this level? Is the error a symptom of the first error?
I worked around the error by unlisting the GRangesList, but this is merging all objects into one and lineage specific information gets lost.
Thanks in advance and best regards Lukas