Closed joewandy closed 1 year ago
Fixed in https://github.com/glasgowcompbio/vimms/commit/79f000645398f9d7e9969a2393d433b5b11d6610.
Below are some results using a threshold of 1E5 for the ROI extraction. The scan duration distributions seem to match better between simulated and real data.
Comparing the cumulative number of scans, we get
There are still some differences > 5000s: the simulated results produces more MS2 scans and less MS1 scans compared to the real data. But I think that is not a timing issue ... to be checked separately.
The 2,2 simulated timings don't seem to match the reference timings. The component distributions seem to be in the right position now, but the weights aren't matching up. This would account for the MS2 divergence later on.
Some note on the figure on top:
(2, 2) is bi-modal, and the bottom bit (around 0.05) is suspiciously similar to (1, 2). Can we check that this is not a bug? Can we actually see this in the mzML file? Any pattern to the two modals?
(2, 1) is also similarly bi-modal, and the top bit (around 0.4) is also similar to (1, 1). Is this correct?
fixed as part of proteomics branch
This is for the `proteomics' branch.
The simulated scan duration for (2, 2), i.e. current ms-level is 2, next ms-level is 2, doesn't match between the seed (reference) file and the simulated mzML file.
As you can see in the (2, 2) plots, all reference values are <0.2s (except outliers), while there are many simulated values >0.2s.
As a scatter plot. Also note the (1, 1) difference between ref and simulated -- that needs to be checked too later.
When we directly sample the (2, 2) scan duration from the timing object, we can see it produces correct results -- most values are below 0.2s, except outliers.
So I suspect it must be a bug in the mass spec that doesn't assign the scan durations correctly to the scan. t seems that somehow both (2, 1) and (2, 2) the results are combined when it should only be (2, 2)?