glasgowcompbio / vimms

A programmable and modular LC/MS simulator in Python
MIT License
19 stars 6 forks source link

Scan timing bug #265

Closed joewandy closed 1 year ago

joewandy commented 1 year ago

This is for the `proteomics' branch.

The simulated scan duration for (2, 2), i.e. current ms-level is 2, next ms-level is 2, doesn't match between the seed (reference) file and the simulated mzML file.

As you can see in the (2, 2) plots, all reference values are <0.2s (except outliers), while there are many simulated values >0.2s.

image

As a scatter plot. Also note the (1, 1) difference between ref and simulated -- that needs to be checked too later.

image

When we directly sample the (2, 2) scan duration from the timing object, we can see it produces correct results -- most values are below 0.2s, except outliers. image

So I suspect it must be a bug in the mass spec that doesn't assign the scan durations correctly to the scan. t seems that somehow both (2, 1) and (2, 2) the results are combined when it should only be (2, 2)?

joewandy commented 1 year ago

Fixed in https://github.com/glasgowcompbio/vimms/commit/79f000645398f9d7e9969a2393d433b5b11d6610.

Below are some results using a threshold of 1E5 for the ROI extraction. The scan duration distributions seem to match better between simulated and real data.

image

Comparing the cumulative number of scans, we get image

There are still some differences > 5000s: the simulated results produces more MS2 scans and less MS1 scans compared to the real data. But I think that is not a timing issue ... to be checked separately.

RonanDaly commented 1 year ago

The 2,2 simulated timings don't seem to match the reference timings. The component distributions seem to be in the right position now, but the weights aren't matching up. This would account for the MS2 divergence later on.

joewandy commented 1 year ago

Some note on the figure on top:

  1. (2, 2) is bi-modal, and the bottom bit (around 0.05) is suspiciously similar to (1, 2). Can we check that this is not a bug? Can we actually see this in the mzML file? Any pattern to the two modals?

  2. (2, 1) is also similarly bi-modal, and the top bit (around 0.4) is also similar to (1, 1). Is this correct?

joewandy commented 1 year ago

fixed as part of proteomics branch