MSI-Metabolomics-Standards-Initiative / inchi-isotopologue-extension

specification extension to InChi to better support isotopologue reporting
1 stars 1 forks source link

Questions about the Isotopically-Resolved Isotopologue Specification #4

Open pierremillard opened 4 years ago

pierremillard commented 4 years ago

Dear InChI Isotopologue and Isotopomer Development Team,

thanks for your initiative to develop enhanced specifications within the regular InChI standard for representing isotopic species.

We are currently implementing the proposed standard in IsoCor, our tool to correct MS data for naturally occurring isotopes (https://github.com/MetaSys-LISBP/IsoCor), and we would like to make sure that we correctly follow the proposed Isotopically-Resolved Isotopologue Specification.

Please, could you confirm that the following isotopic layers are correct:

We prefer defining explicitely labeling for all atoms because, if I well understood the specifications, non defined atoms in the isotopic layer at considered to be at natural abundance, is it right?

We are aware of the development of isoenum (https://github.com/MoseleyBioinformaticsLab/isoenum), and we first thought using this library to generate isotopic inchis. However, we decided not to include this library because the dependency of IsoCor would be too high (in particular due to the openbabel requirement) and we could not use it in some of our pipelines (e.g. W4M which is based on galaxy). Since we just need a fraction of what isoenum is capable of (basically we just need to generate isotopic layers for tracer isotopologues), in the current implementation in isocor (dev branch) we just need a few lines of codes to generate them. Still, we would like to make sure the proposed extension would be correctly handled by IsoCor, which is why we are contacting your group.

Thanks, @pierremillard

hunter-moseley commented 4 years ago

Pierre,

To answer your question, the atoms are considered to be most abundance stable isotope, unless specified otherwise. So for these compounds, the specification for 16O and 12C is redundant. The redundancy would be nice for detecting errors; however, given the IUPAC InChI Subcommittee's focus on brevity, I do not think they would agree to changing the accepted isotopologue specification. Also, it would make SDF representation more complex. On top of this, the specification has been accepted and strong justification would be needed for changing it.

Therefore the following /a layers would be correct. 16O2,18O2-fumarate: /a(O2+2) 16O1,17O3-fumarate: /a(O3+1) 13C1,12C3-fumarate: /a(C1+1)

Cheers, Hunter

P.S. In your use-case, it makes no sense to use isoenum.

pierremillard commented 4 years ago

Thanks Hunter for this detailed reply. This clearly helps! Of course we will stick with the specifications - and brevity.

So how could we represent 12C4 fumarate? just with "/a"? According to the section 9.3 of the technical faq (https://www.inchi-trust.org/technical-faq-2/), the absence of isotopic layer should correspond to naturally occuring isotopes.

Cheers, Pierre

hunter-moseley commented 4 years ago

Pierre,

12C4 fumarate would have no /a layer.

Cheers, Hunter

pierremillard commented 4 years ago

Thanks again for taking some time to address our questions, I think we now have all information we need to finish the implementation!

The most recent doc I have found on the Inchi website is from 2017, which may explain the apparent discrepancy between the isotopic layer detailed on this documentation and the extension you have proposed. It would be very kind if you could provide us with some links to official IUPAC documents that include the most recent standards (provided such documents have already been released of course). We could add this information in the documentation of IsoCor, which would help users to better understand isotope representation using inchis.

Cheers, Pierre

hunter-moseley commented 4 years ago

Pierre,

I believe this FigShare repo has the up-to-date specification. https://figshare.com/articles/InchI_Isotopologue_and_Isotopomer_Proposal/7150964

Cheers, Hunter

hunter-moseley commented 4 years ago

Pierre,

One more thing, 12C4 fumarate could be (probably should be) specified as an isotopomer with a /i layer. While developing the specification, we recognized that the spectral data would sometimes identify a specific isotopomer without ambiguity. In these cases, the specific isotopomer would be used in the identification. This would allow generation of molecular representations that identify the location of the isotope.

I know that this will complicate your generation of isotope-specific InChI, but it would allow more automatic generation of isotope-specific molecular representations, without making assumptions.

I apologize for not bringing this up earlier, but I had to think about your questions a bit further.

Cheers, Hunter

pierremillard commented 4 years ago

Thanks again Hunter, and no worries we also need time to think about these questions.

We also thought using the '/i' layer, but we want to be explicit about both the tracer element and the tracer isotope (it is primordial for us to keep this information explicit). Moreover, we do not always have the atom numbers of the tracer element (e.g. if no inchi has been given for the metabolite, in this situation we will just generate the isotopic layer). Also, we will prefer to not mix '/i' and '/a' layers.

I see different options to represent 18O-isotopologues of fumarate:

16O4-fumarate:

16O318O1-fumarate: '/a(O3+0),(O1+2)' but again you mentioned this one does not follow the standard, so we should rather use just '/a(O1+2)' 16O218O2-fumarate: '/a(O2+2)' 16O118O3-fumarate: '/a(O3+2)' 18O4-fumarate: '/a(O4+2)'

So, if I understand well, the best way to comply with the standard is to use the /a layer for all except the unlabeled one, where we should not use any layer.

Sorry to bother you with our questions.

Cheers, Pierre

hunter-moseley commented 4 years ago

Pierre,

The /a layer is used when location of the isotope is ambiguous. The /i layer is used when the location of the isotope is specific and known. Some of the spectral data indicates ambiguous location of isotope, but other spectral data indicates specific location of isotope.

THe 1804-fumarate is not ambiguous and would use the /i layer to specific the specific location of the 18O.

You will have the specific atom numbering for molecules with an InChI. If you do not have an InChI specific atom numbering for a given molecule, then you cannot specify an isotopomer with the /i layer.

I hope this makes it clearer when /a and /i layers are used.

Cheers, Hunter

pierremillard commented 4 years ago

Dear Hunter,

Thanks, everything is clear. We will stick with the /a layer, which corresponds to the molecules we deal with.

So now we face an implementation choice.

As you mentioned, we agree that “(O2+0)” is redundant in “/a(O2+2),(O2+0)” when we represent 6O2,18O2-fumarate. Still, after carefull consideration of the different documents, I have noticed the following sentence:

The value of isotopic shift “+0” means that the atom is of a specific isotope whose mass number is the same as the rounded average atomic mass. […] A seeming ambiguity arises for elements that have a single stable isotope, namely, Be, F, Na, Al, P, Sc, Co, As, Nb, Rh, I, Cs, Pr, Tb, Ho, Tm, Au, Bi, Pa, or a single known isotope.. For these elements, “+0” seems to be redundant and to create an ambiguity. However, “+0” reflects the intention of the user to distinguish the particular isotope from others, possibly artificial isotopes of the same element.

I think this may also apply to our situation: we are redundant but this reflects our intention to “distinguish the particular isotope from others”. This suggests that brevity is not mandatory if done on purpose to be (more) explicit.

Unfortunately, we could not find yet any metabolomics software that implements isotopic inchis yet, so we cannot identify a consensus of the community on this question. The last version of the InChI Software (v1.05) was released in January 2017 and does not seem to implement this recent isotopic extension.

As a starting point, we will likely implement the “redundant” version in the next release of IsoCor. Of course, we will update IsoCor to better comply with the standard and other implementations when other tools will be released. Hopefully, feedback from the community will also help us to improve our implementation of the standard in IsoCor.

Thanks again for your time and for these detailed answers, we understand much better the standard now!

Cheers, Pierre