csiro-crop-informatics / biokanga_align_paper

0 stars 0 forks source link

Optimal and/or comparable settings for all aligners #5

Open rsuchecki opened 6 years ago

rsuchecki commented 6 years ago

We are currently using default settings for each of the aligners. This is a good starting point as needed for illustrating how the tools perform out-of-the-box. As the defaults vary widely, in the next step we may need to tailor the alignment settings to allow for a fairer, more direct comparison between the aligners. For example, allow up to 3 mismatches per 100bp, enable or disable indels, soft-clipping etc.

An alternative would be to try to tailor each of the aligners' settings to be optimal by some standard, but this could be very time and resource expensive. On the other hand, if explored via parameter sweep set within some reasonable limits, this may be a very good use of the implemented framework, providing answers to questions on what parameters one should use wit each of the aligners (if one trusts that the simulation sufficiently reflects properties of real input data).

Proposal: use constrained parameter sweep and evaluate based on proportion of reads aligned correctly, wrongly and unaligned. Among the explored ranges we should be able to select ones which allow for fairer comparison of the tools.

alexwhan commented 6 years ago

I think constrained parameter setup makes a lot of sense

alexwhan commented 6 years ago

It would also be interesting to use parameters from published papers, but maybe not in this context

rsuchecki commented 6 years ago

It would also be interesting to use parameters from published papers, but maybe not in this context

This may help establish some bounds on what is reasonable to explore.