ChaissonLab / danbing-tk

Toolkit for VNTR genotyping and repeat-pan genome graph construction
BSD 3-Clause "New" or "Revised" License
21 stars 3 forks source link

Build pipeline uses only ref VNTRs or all assemblies? #17

Closed ASLeonard closed 2 years ago

ASLeonard commented 2 years ago

Hi, I may be reading something wrong, but in the build pipeline, the config file seems to only take in VNTRs discovered from the reference (here), although the README seems ambiguous if it is ref-only (like tr.good.bed) or all assemblies.

Running danbing-tk build

  • Required inputs:
  • haplotype-resolved assemblies (FASTA)
  • matched SRS data (BAM; optional)
  • reference genome (major chromosomes only without minor contigs)
  • tandem repeat regions (BED; available on release page or user-defined)

while from the manuscript methods definitely sounds like all assemblies are used TRF37 v4.09 (option: 2 7 7 80 10 50 500 -f -d -h) was used to roughly annotate the SSR regions of five PacBio assemblies (AK1, HG00514, HG00733, NA19240, NA24385).

Could you clarify if the build pipeline uses VTNRs discovered from each assembly or only the reference, and and how to include them if so?

Thanks, Alex

joyeuxnoel8 commented 2 years ago

Hi Alex,

Thanks for the question. The idea of the pipeline is to take a set of VNTR regions on the reference coordinate, map regions from reference to each assembly, and then summarize the orthology mapping. So to answer your question, yes, all VNTRs from all assemblies are used but only the reference coordinates are required.

-Tony

ASLeonard commented 2 years ago

Great, thanks for clarifying Tony!