Shamir-Lab / syncmer_mapping

Implementations of syncmer-based long read mappers
MIT License
6 stars 1 forks source link

minimizer option still there? #1

Closed jianshu93 closed 8 months ago

jianshu93 commented 8 months ago

Hello syncmer_mapping team,

I noticed that the modified minimap2 (syncmer), minimizer window size is now 0, but several options related to minimizer are still there (e.g., minimal number of minimizers on a chain, this should be minimal number of syncmer on a chain?). What are the options related to syncmer?

Thanks,

Jianshu

AbhinavDutta commented 8 months ago

Hi, You can find the syncmer related options here. Yes, your intuition is correct, that option corresponds to minimum number of anchors on the chain (that automatically translates to syncmers, minimizers or a mix of both depending on how you build the index)

jianshu93 commented 8 months ago

Thanks for the quick response, so the command line option --downsample 1.23 --s-mer 5 --pos1 1 --pos2 10 is not updated in the help but they are actually there right? how does those values were passed to main program from command line, I have the help info after compiling in the synmer_minimap directory:

Options: Indexing: -H use homopolymer-compressed k-mer (preferrable for PacBio) -k INT k-mer size (no larger than 28) [15] -w INT minimizer window size [0] -I NUM split index for every ~NUM input bases [4G] -d FILE dump index to FILE [] Mapping: -f FLOAT filter out top FLOAT fraction of repetitive minimizers [0.0002] -g NUM stop chain enlongation if there are no minimizers in INT-bp [5000] -G NUM max intron length (effective with -xsplice; changing -r) [200k] -F NUM max fragment length (effective with -xsr or in the fragment mode) [800] -r NUM[,NUM] chaining/alignment bandwidth and long-join bandwidth [500,20000] -n INT minimal number of minimizers on a chain [3] -m INT minimal chaining score (matching bases minus log gap penalty) [40] -X skip self and dual mappings (for the all-vs-all mode) -p FLOAT min secondary-to-primary score ratio [0.8] -N INT retain at most INT secondary alignments [5] Alignment: -A INT matching score [2] -B INT mismatch penalty (larger value for lower divergence) [4] -O INT[,INT] gap open penalty [4,24] -E INT[,INT] gap extension penalty; a k-long gap costs min{O1+kE1,O2+kE2} [2,1] -z INT[,INT] Z-drop score and inversion Z-drop score [400,200] -s INT minimal peak DP alignment score [80] -u CHAR how to find GT-AG. f:transcript strand, b:both strands, n:don't match GT-AG [n] Input/Output: -a output in the SAM format (PAF by default) -o FILE output alignments to FILE [stdout] -L write CIGAR with >65535 ops at the CG tag -R STR SAM read group line in a format like '@RG\tID:foo\tSM:bar' [] -c output CIGAR in PAF --cs[=STR] output the cs tag; STR is 'short' (if absent) or 'long' [none] --MD output the MD tag --eqx write =/X CIGAR operators -Y use soft clipping for supplementary alignments -t INT number of threads [3] -K NUM minibatch size for mapping [500M] --version show version number Preset: -x STR preset (always applied before other options; see minimap2.1 for details) []

See `man ./minimap2.1' for detailed description of these and other advanced command-line options.

Thanks,

Jianshu

AbhinavDutta commented 8 months ago

Hi, Yes exactly even though they are not updated in the help section, these options do work as expected.