Open theosanderson opened 2 years ago
+1
Thanks @theosanderson for the heads-up. Anecdotally, I've seen a few other sets of adjacent (or at least close) mutations that cause trouble in the Omicron branches of the tree, although I haven't got a nice analysis with evidence like @BioWilko's to explain them! I can provide lists of sequences in case anyone would like to take a look.
Hi @LiXingguangBrandonStark -- I haven't used mask_alignment_using_vcf.py nor did I write it (from github history it looks like @conorwalker is the main author), but if you cd to the ProblematicSites_SARS-CoV2/src/ directory and then run
python3 mask_alignment_using_vcf.py
it outputs brief usage instructions:
usage: mask_alignment_using_vcf.py [-h] [-m] [-c] [-b] [-d]
[-n MASK_CHARACTER] [-r REFERENCE_ID] -v
VCF -i INPUT_FASTA -o OUTPUT_FASTA
mask_alignment_using_vcf.py: error: the following arguments are required: -v/--vcf, -i/--input_fasta, -o/--output_fasta
(I use different tools to mask VCF instead of fasta, using the file problematic_sites_sarsCov2.vcf.)
Hi @LiXingguangBrandonStark!
Did you clone this repository? (git clone https://github.com/W-L/ProblematicSites_SARS-CoV2.git
) You can then find the vcf for masking sites at ./ProblematicSites_SARS-CoV2/problematic_sites_sarsCov2.vcf
and the script to mask alignments in FASTA format at ./ProblematicSites_SARS-CoV2/src/mask_alignment_using_vcf.py
, with usage instructions as posted by @AngieHinrichs (Thank you!) If you encounter issues using the files, please feel free to open a new issue.
The vcf has a column named FILTER
with a recommendation for each site to either mask it before performing downstream analyses or to otherwise be cautious with interpreting results due to potential misleading effects that the site may cause. You can find more info about this in the original post on virological.org. The files in subset_vcf
separate the sites from the main vcf into these two categories
I belately saw this message from @BioWilko https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473/17
I agree that it makes sense to add these to the mask - I can see some issues on the UShER tree that result from these (not hundreds, but tens) [@angiehinrichs for info]