At the moment, the default setting for the -m parameter is often too small to remove a lot of sequencing artifacts (e.g., primer dimers), but it would seem that using the actual read lengths from the dataset would be a more robust default (say, using -m 100 for a dataset with 150bp reads, possibly implemented as percentage of the expected read length?). I know a lot of users aren't aware of the importance of this parameter's value, and resulting seq outputs can potentially be impacted.
At the moment, the default setting for the
-m
parameter is often too small to remove a lot of sequencing artifacts (e.g., primer dimers), but it would seem that using the actual read lengths from the dataset would be a more robust default (say, using-m 100
for a dataset with 150bp reads, possibly implemented as percentage of the expected read length?). I know a lot of users aren't aware of the importance of this parameter's value, and resulting seq outputs can potentially be impacted.