ProteoWizard / pwiz

The ProteoWizard Library is a set of software libraries and tools for rapid development of mass spectrometry and proteomic data analysis software.
http://proteowizard.sourceforge.net/
Apache License 2.0
235 stars 101 forks source link

[Request] injection duration included in computation of area under chromatogram #2519

Closed martinPasen closed 1 year ago

martinPasen commented 1 year ago

If i looked at the correct part of the code and understood it correctly, you are not using "injection duration" in computation of area under the chromatogram. I do not know if this is a case for other instruments but when i look at mzML files from Fusion or orbitrapQE, there is an information about how long the ions were collected. I think this is an important part of the equation and can change the final area by orders of magnitude.

For example in my data i have MS1 scan roughly every 2 seconds while the duration of the collection of the ions is roughly 0.02 seconds. So straightforward "integration" would underestimate the area by 1/0.02.

I have wrote a lot of "ifs" in the message and I have touched only small amount of instruments, so probably there is some problem in what i wrote, but I still wrote it on the off chance that it might be useful.

Thank you for the great product and have a nice day :)

nickshulman commented 1 year ago

My understanding is that this is not necessary because of "automatic gain control" which the mass spectrometer performs. The intensity values which ProteoWizard sees and reports to you have already been divided by the fill time.

If you wanted to know the actual number of ions that were hitting the detector of the mass spectrometer you would need to multiply the intensity by the fill time. That number would be useful in terms of estimating noise caused by Poisson distributions (i.e. variation in the number of ions hitting the detector caused by the fact that they are a finite number of ions and will not be evenly distributed). However, if you want to know the number of ions that were coming off the chromatography column at any given time, then the intensity values that you see in the spectra have already been scaled to reflect the fact that the mass spectrometer is allowing only a fraction of those ions to enter the detector. -- Nick

On Wed, Mar 1, 2023 at 7:38 AM martinPasen @.***> wrote:

If i looked at the correct part of the code and understood it correctly, you are not using "injection duration" in computation of area under the chromatogram. I do not know if this is a case for other instruments but when i look at mzML files from Fusion or orbitrapQE, there is an information about how long the ions were collected. I think this is an important part of the equation and can change the final area by orders of magnitude.

For example in my data i have MS1 scan roughly every 2 seconds while the duration of the collection of the ions is roughly 0.02 seconds. So straightforward "integration" would underestimate the area by 1/0.02.

I have wrote a lot of "ifs" in the message and I have touched only small amount of instruments, so probably there is some problem in what i wrote, but I still wrote it on the off chance that it might be useful.

Thank you for the great product and have a nice day :)

— Reply to this email directly, view it on GitHub https://github.com/ProteoWizard/pwiz/issues/2519, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKAY3EXBGWVI6MPYPLL63WZ5UQHANCNFSM6AAAAAAVMGX5IA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

chambm commented 1 year ago

Without telling us which part of the code you looked at and which area under the chromatogram you're suspicious of, it's hard to to address your concern.

martinPasen commented 1 year ago

Thanks for the fast replies.

So if i understand it correctly, when i convert raw to mzML (I used for example ThermoRawFileParser) the intensities are already normalised to 1 second?

I got this concern because i tried to code my own "integrator" and i was getting orders of magnitude different results. Then i just looked for some keyword such as "area" and i found this file: /pwiz_tools/Skyline/Model/Results/PeakShapeStatistics.cs

there i saw almost identical "integration" just without the division by the duration of injection.

So i wanted to know what is going on and why would you not divide by the injection duration. Exactly as you mentioned "automatic gain control" is selecting the length of injection window so if we were comparing samples with different complexities, the lengths would be different and without the normalisation it would cause problems. But now i understand that i was dividing by the injection duration for the second time.

Once again thanks for the replies and If i understood you correctly then I am happy to close this request :)

martinPasen commented 1 year ago

I have converted raw file to mzML using yours MSconvert and ThermoRawFileParser. In both of them the unit of MS1 intensities is 'number of detector counts' which is defined as "The number of counted events observed in one or a group of elements of a detector." (defined here https://raw.githubusercontent.com/HUPO-PSI/psi-ms-CV/master/psi-ms.obo ) From that name and definition I would suspect that it is not yet divided by the fill time, especially when there can be unit 'counts per second'.

These intensities are proportional to what i get from skyline report when i export: "Results!*.Value.Chromatogram.RawData.Intensities"

When i integrate them in the most naive way i get area similar to area in skyline report. On the other hand if i first divide them by fill time and then integrate i get area that is roughly 2 orders of magnitude higher.

That is why i assumed that you are using not normalised values. Can you please shed some light on this?

Is the unit in mzML wrong? or is my understanding of the unit incorrect?

chambm commented 1 year ago

ProteoWizard uses number of detector counts when it's not clear what other term we should be using. Thermo intensities have not been pure counts or voltages for a long time AFAIK: they are always normalized somehow just like their collision energy. So basically they are "arbitrary/proprietary units". The precursor intensity that comes directly from ProteoWizard (in the mzML) is a very rough estimate. It's not integrated over time, it is just summing up the data points in the isolation window for the one scan that triggered the MSn. Don't use it for any serious quantitation.

martinPasen commented 1 year ago

Clearly I am out of my depth here. I believe I understand where I was doing a mistake (dividing intensities by injection duration for the second time). Thank you for the responses :) If that is ok i will close this thread.