Closed MatthewPace98 closed 1 year ago
Closing this to clean up this issue section.
To clarify, while both variables refer to a maximum number of mismatches, the difference is that:
n_mismatches
is used by the aligner as a threshold for valid alignment between the reference genome and spacers.
max_mm
is used for the calculation of off-target scores to set the maximum number of mismatches between each off target and wildtype sequence.
If I understood correctly (@Jfortin1 please correct me if I am wrong), if for example n_mismatches
= 4 and max_mm
= 3, off-target sequences with 4 mismatches will be filtered out at the scoring phase either way. And similarly, n_mismatches
= 3 and max_mm
= 4 would also be useless since off-targets with 4 mm would be filtered in the initial stage
The only scenario I can think of when these two variables should be different is if you set n_mismatches
to a higher value, it allows the alignment stage to find and store more potential off-target sequences. You may then use the max_mm
parameter to restrict the off-target score calculation to a smaller subset of those sequences which could be useful if you want to assess the off-target effects with different mismatch without having to rerun the entire alignment process for each value.
@MatthewPace98 Yes, you got it right, we'll add more information on the documentation to help with this.
You can specify the number of mismatches for
addSpacerAlignments
usingn_mismatches
and you can do the same foraddOffTargetScores
usingmax_mm
. Should there ever be a difference between the value formax_mm
and forn_mismatches
?