MetaSys-LISBP / IsoCor

IsoCor: Isotope Correction for mass spectrometry labeling experiments
https://isocor.readthedocs.io
GNU General Public License v3.0
24 stars 9 forks source link

_check_data_isotopes has bad assumption #42

Closed jmmitc06 closed 2 months ago

jmmitc06 commented 2 months ago

Attempting to add the two isotopes of chloride to the isotopes.dat for use via the library raises an exception. The source of this exception is that the function assumes there should be no missing isotopes based on a mass delta between subsequent isotopes. However, for Cl (and other elements), we do not have isotopes one-after-another with a mass delta < 1.2.

I think a good solution here is instead of having the user specify isotopes, have an exhaustive list of all isotopes based on https://physics.nist.gov/cgi-bin/Compositions/stand_alone.pl. Then there is never a need to specify custom isotopes as all possible isotopes are considered.

pierremillard commented 2 months ago

Regarding chloride, the two isotopes have masses of 34.968 and 36.965. Because of a part of the calculation algorithm (which relies on convolution of isotopic vectors of individual elements), the abundance of each isotope must be provided for mass differences around 1. I.e., for chloride, which has only M0 (mz ~35) and M2 (mz ~37) but no M1, the way to provide the abundance of each isotope would be:

mass abundance 34.968 0.7576 36.000 0 36.965 0.2424

This indicates that the abundance of mass ~36 is 0.

We provide this information in the documentation (see https://isocor.readthedocs.io/en/latest/tutorials.html#isotopes-database-isotopes-dat :

For elements with gaps in the list of nominal mass of isotopes (e.g. for sulfur with isotopes 33S, 34S, 36S, but not 35S), declare the missing isotope(s), with the exact mass set at the missing integer(s), and an abundance of 0 (as done in the example file for sulfur).

I agree this is not very intuitive, but we decided this explicit format to avoid some potential issues. We could have automatically added the missing masses (internally in isocor), but we wanted to make sure that the user has not forgotten to declare one isotope.

You are right, we could also get this information from https://physics.nist.gov/cgi-bin/Compositions/stand_alone.pl however we prefer to stick with an independent file for the following reasons:

Since this is not blocking to define chloride and perform natural correction for this elements, we would prefer to stick with the current files. Do not hesitate to reply if you think this is not relevant, we are open to discussion.

jmmitc06 commented 2 months ago

Thanks for the clarification, my apologies for not finding that in the documentation.

I thank that a future improvement could be to harden the convolution to handle the gaps more elegantly since it is essentially a cross-product across a bunch of lists with masses, abundances, etc. That said, only in the edge case of these more "exotic" elements does it even matter. So tl;dr, I think the current implementation makes sense for 99% of use cases.

My motivation for including all the isotopes was that I'm wrapping IsoCor into another tool and I wanted comprehensive defaults. I went ahead and did the painful thing of filling in the gaps for all the isotopes in the NIST table. I've attached the table in case you do ever want to use it (or if someone finds the issue and they may want it).

all_isotopes_NIST.csv

pierremillard commented 2 months ago

No worries, thank you very much for providing the file corresponding to the complete NIST table!

We'll definitely think about a cleaner way of providing masses & abundances for all elements in the future (have in mind a v3 since a couple of years), and ideally getting a default from NIST.

gmat commented 2 months ago

My motivation for including all the isotopes was that I'm wrapping IsoCor into another tool and I wanted comprehensive defaults. I went ahead and did the painful thing of filling in the gaps for all the isotopes in the NIST table.

Hi @jmmitc06, for your information, just in case, for wrapping IsoCor, you could use the CLI as the galaxy wrapper does https://toolshed.g2.bx.psu.edu/view/gmat/isocor/fa183805db31 or import module in your program in Python or a another language with a python wrapper. The last solution is not the easiest but not so complicated. By this way you could do what you want your data, before and after iscocor correction like getting any list of all isotopes directly from or to the web (or anywhere else).