bioinf-benchmarking / mapping-benchmarking

Snakemake pipeline for benchmarking read mappers
16 stars 1 forks source link

strobealign result on new experiment and version #3

Open ksahlin opened 1 year ago

ksahlin commented 1 year ago

Hi @ivargr ,

I saw that the results look sort of new, and wondered which strobealign version you are using? Particularly, I was surprised about the results of strobealign with high error profile (here: https://github.com/ivargr/mapping-benchmarking/blob/benchmarks/reports/main.md#accuracy-for-different-error-profiles).

While short single-end reads is not strobealign's strength, the drop at read length 150 look suspicious (i.e., why would shorter reads have higher accuracy). The only thing I could think of is if

  1. we use very different alignment parameters for 150, but still it should not be that bad. We havent done much benchmarking on single-end reads in our development but it this result turns out true, I think we should. Pinging @marcelm for awareness.
  2. Perhaps another explanation is that the index for these reads was not changed/updated and something was changed in a newer version of strobealign, or that this simulation use the wrong index?

Which version did you run for strobealign? v0.9.0 should be slightly more accurate and slightly higher SNP calling recall (although marginal). We also have a very exiting update out soon (hopefully) regarding memory usage.

Also, I saw your effort on the open manuscripts. I hope I can get to help a bit. do you have any deadline in mind?

ivargr commented 1 year ago

Hi!

Sorry for the late reply!

I hadn't specified strobealign version in the conda yaml file, and it seems it used 0.0.8. I've changed it now to use 0.0.9, and if I'm not mistaken, the results look better now. I also found the dip at read length 150 a bit strange, not sure what could have caused that (I don't think index mismatch could have been the reason), but I think that dip is gone now after updating to 0.0.9.

Very cool that you want to help out with a manuscript :) I plan to tweet an invitation to contribute with some more info about the project within the next days, and depending on who and how many wants to be a part of the project I'll try to find the best way to organise things (maybe a meeting could be useful). No specific deadline in mind, but I think aiming at a preprint before summer vacation could be nice.

ksahlin commented 1 year ago

No worries, not in a rush :)

Of course, I am bugging you enough here with adjusting the analysis... :) The problem with me helping may be that I am the author of one of the evaluated tools which may not look good, but we can discuss.

Also, @marcelm figured that bcftools variant caller behaves differently with =/X vs M Cigar strings (see here). Therefore next release that will come very soon will emit M cigars by default. Not sure if it affects the caller you use, but I guess M cigars is good for consistency with the other aligners. Also, in the peak memory will drop significantly in the next version. I will bump this thread when it is out.