ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
523 stars 111 forks source link

understand "--clip" in cactus-graphmap-join #1057

Closed jyj5558 closed 1 year ago

jyj5558 commented 1 year ago

Hi,

I am now at the last step "cactus-graphmap-join", and because of too many gaps I want to apply "--clip" option as suggested in: #859. I am thinking to apply "-clip 1000" as it is a way smaller than the default value 10000 but I could not understand clearly from the documentation and help command what this option will do.

My question is, Does this option will remove the contigs that have an unaligned sequence stretch > 1,000 in it? (e.g., a whole contig of 20,000 length that has unaligned 1,500 sequences in a row.) Or, does this option will remove the reference genome's regions that have an unaligned sequence stretch > 1,000 by at least one contig? (e.g., a region Chr_1:700-2,300 where a contig could not align its 1,002 sequence onto the region as the sequences are Ns.)

I hope this question is clear. Thanks always for your help!

glennhickey commented 1 year ago

It only cuts out the unaligned part. You may want to read the paper https://doi.org/10.1038/s41587-023-01793-w and latest manual https://github.com/ComparativeGenomicsToolkit/cactus/blob/master/doc/pangenome.md for more details. I would stick to the default values in terms of bases clipped.