ChaissonLab / TT-Mars

Structural Variants Assessment Based on Haplotype-resolved Assemblies
BSD 3-Clause "New" or "Revised" License
21 stars 4 forks source link

Can i use TT-Mars for bacterial genomes? #4

Closed fugunokaraage closed 1 year ago

fugunokaraage commented 1 year ago

HI, I study bacterial genomes. I want to use TT-Mars to assess mapping quality in structure variants of bacterial short reads to a reference genome. Can I use this for the purpose?

quentin0515 commented 1 year ago

Hi! TT-Mars is designed for any kind of structural variants validation. It can assess the accuracy of your variant calls as long as the genome assembly is available. Could you provide more details about your project? Thanks, --Quentin

fugunokaraage commented 1 year ago

Dear Quentin,

Thank you very much for your reply. I want to create input files for drawing trees by using Gubbins, BratNextgen, etc. that can detect and remove recombination regions. I'm planning to conduct time-divergence analysis after removing recombination regions.

I have short reads of test isolates (they are haploid) that belong to the same clone and I plan to map the reads to a reference genome. Then, after variant calling for both SNV and indels, I want to obtain aligned complete genomes of the test isolates using vcf files and vcf-consensus. I mask the indel regions including structure variant regions with "N". Through the process, I want to use TT-Mars to evaluate the quality of the called SNPs and indels. As you know, mapping and variant calling for bacterial genomes are still challenging; there is a problem of the balancing between sentisitivity and faulse positve. In addition, the detection of large indels (i.e. structure variants) is very difficult and there are few softwares that can detect large indels with a good sensitivity. Actually, I tried to use Freebayes, Deepvariant, GATK, Snippy, Pilon and Octopus, only Pilon could detect large variants; other softwares often identified the regions as the accumulating SNP and/or short indel regions. I want to build a in-house pipeline to do these process automatically (and accurately) due to thoudands of isolates. So, I want to take TT-Mars into the pipeline for the purpose of the judgement of the qulity of the variant calling.

Satoshi Mapping.pdf

quentin0515 commented 1 year ago

Hi Satoshi,

Thanks for your clarification. Here are my suggestions and notes about using TT-Mars in your project:

  1. TT-Mars is designed for validation of structural variations. It should work for indels > 20bp, but not for SNPs.
  2. TT-Mars is designed to validate on human genome with pre-processed files (which can be downloaded here: https://github.com/ChaissonLab/TT-Mars/blob/main/download_files.sh and https://github.com/ChaissonLab/TT-Mars/blob/main/download_asm.sh). It requires human reference genome and sample genome assembly. In your case to use TT-Mars, a reference genome and bacteria assembly is required.
  3. You can refer to this file about how to use TT-Mars with your own reference and sample genome: https://github.com/ChaissonLab/TT-Mars/blob/main/liftover.sh Basically, it follows these steps:
    1. align your assemblies to a reference
    2. TT-Mars will trim overlapping contig in the alignment
    3. Generate liftover files
    4. Compress
    5. Generate regions not covered by the assemblies So it needs additional efforts as mentioned above.

Hope this helps. Please let me know if you are not clear with any of the above steps and I will be more than happy to help.

--Quentin