Open ammaraziz opened 9 months ago
@gokeson this one is for you!
[ ] Same for ompa gene only We already have this sorted right? I mean all the ompA sequences that we use for BLAST
[ ] Create bed file of recombination regions to mask I think we should use clonalFrameML after running kSNP. To use ClonalFrameML, we do not need to create a bed file, the program filters the recombination site by itself. I favour this approach mainly because recombination in Ct is ever-evolving. That's actually how some new strains evolve. The alternative is to use Gubbins which requires prior alignment to a reference genome with SKA. I don't think this is a bad approach as all the 20 strains will be aligned to our choice reference genome, most likely strain D as it is the best annotated genome.
MakeKSNP4infile -indir (folder) -outfile myInfile
Kchooser4 -in myInfile
kSNP4 -in myInfile -outdir name_of_output_directory -k 13 -core -ML -vcf
ClonalFrameML tree.SNPs_all.ML.tre SNPs_all_matrix.fasta final_clinical_clonalframe
I think we should avoid computational heavy like ClonalFrameML, how long does it take to run?
I think we should avoid computational heavy like ClonalFrameML, how long does it take to run?
220 minutes for 125 samples to run clonalframeML after already running ksnp for 3 hours for same samples.
This is where gubins may be advantageous. it requires a ref genome but Ithink that is okay. Everyone uses strain D for gubbins so why not
This is where gubins may be advantageous. it requires a ref genome but Ithink that is okay. Everyone uses strain D for gubbins so why not
And gubbins is really fast. If we stick with our one sample one tree approach, gubbins makes sense. Anyone needing to do comparative analyses for multiple samples can use ksnp+clonalframeML
- Use KSNP for generating reference free snps
mamba create --name myenvname gubbins
mamba activate gubbins_env
generate_ska_alignment.py --reference seq_X.fa --input input.list --out test.aln
run_gubbins.py --prefix gubbins_out test.aln --tree-builder raxml
On your 125 sample tree with clonalframeML, can you extract the snps which are flagged as recombinant?
On your 125 sample tree with clonalframeML, can you extract the snps which are flagged as recombinant?
Nope. I tried but can't. Do we need it though?
Create phylogeny for each sample.