clarification of -e and -m flag function

EnriqueDoster commented 4 years ago

Hello catch developers,

Could you please clarify how the probe stride parameter (-ps) is affected by the coverage extension parameter (-e)? Also, does the mismatch (-m) flag make a distinction between contiguous and non-contiguous mismatches?

Thank you in advance for your time.

Best, Enrique

haydenm commented 4 years ago

-ps is not affected in any direct way by -e.

The probe stride (-ps) just determines how candidate probes are generated by tiling: in general smaller values are better, but lead to higher runtime and more memory. The probe stride should almost always be less than the probe length (-pl). For example, we commonly use -pl 75 -ps 25 or -pl 100 -ps 50. In experiments it seems this has little impact on the output, and I would recommend choosing a value that's ~1/2 or ~1/3 of the probe length.

The extension (-e) defines how many nucleotides on each site of a probe you assume to be captured along with the region complementary to the probe. You can usually determine this based on expected fragment length. For example, if your probe length is 100 nt and you expect fragments of length 300 nt, -e could be as high as 100. We have never used greater than -e 50, which is a conservative choice. You can view -ps and -e independently, as long as the probe stride is less than the probe length.

-m does not make a distinction between contiguous and non-contiguous mismatches—it's simply the total number of mismatches tolerated across the probe (unless you use -l/--lcf-thres, in which case it is the number tolerated over that length). But the argument --island-of-exact-match might be relevant to you if you care about this, and is often useful in practice: it enforces a certain number of contiguous nucleotides to match.

EnriqueDoster commented 4 years ago

Great, thanks for help!

broadinstitute / catch

clarification of -e and -m flag function #33