dan2097 / jna-inchi

Wrapper to access InChI from Java
GNU Lesser General Public License v2.1
7 stars 6 forks source link

Cleaner InChI mass interaction/API quality of life #19

Closed johnmay closed 1 year ago

johnmay commented 1 year ago

Since InChI stores mass as a delta you need to know what the base line it is a delta from. The InChI API (and by extension JNA InChI) is asymmetric in that you pass in an exact mass and get a "isotopic shift" out - the caller then needs to translate this to something sane. I think perhaps it accepts both an exact 'mass number' or a 'isotopic shift' on input so it's not quite asymmetric but I believe it always gives an 'isotopic shift' on output.

I've just fixed the issue in CDK and found the table in the INCHI-BASE directory: https://github.com/cdk/cdk/pull/936. It would also be possible to derive the table by round-tripping in enough values.

I wonder if doing the translation in JNA InChI would be reasonable and a more usable API?

dan2097 commented 1 year ago

Am I right in saying that this is in reference to the getInchiInputFromAuxInfo and getInchiInputFromInchi methods? The IXA API nominally works just on absolute atomic masses and hides the conversion. ...although it undocumentedly does allow you to enter masses differences directly if the number is between 9900 and 10100 e.g. 10010 is +10. (This also means that the current bug doesn't obviously flag up in testing as the isotope erroneously coming out as a number in this range can still be roundtripped)

I don't think the IXA API has an equivalent to getInchiInputFromAuxInfo hence why I presume I used the classic API, not sure about getInchiInputFromInchi but as I need to fix both methods there doesn't seem to be an advantage to switching just one to the IXA API.

Am I correct in saying that what you're proposing would be to implement the lookup here: https://github.com/dan2097/jna-inchi/blob/c49f2d779578c74519db3fa31a572ae6db1832d1/jna-inchi-api/src/main/java/io/github/dan2097/jnainchi/JnaInchi.java#L584

This logic would presumably look essentially the same as:

        int mass;
        if (( ISOTOPIC_SHIFT_FLAG - ISOTOPIC_SHIFT_MAX <= output.atom[atom_index].isotopic_mass ) &&
            ( output.atom[atom_index].isotopic_mass <= ISOTOPIC_SHIFT_FLAG + ISOTOPIC_SHIFT_MAX ))
        {
            int base_mass = get_atomic_mass( output.atom[atom_index].elname );
            mass = base_mass + output.atom[atom_index].isotopic_mass - ISOTOPIC_SHIFT_FLAG;
        }
        else
        {
            mass = output.atom[atom_index].isotopic_mass;
        }

Where get_atomic_mass is looking up in the table you mentioned, ISOTOPIC_SHIFT_FLAG is 10000 and ISOTOPIC_SHIFT_MAX is 100

johnmay commented 1 year ago

Correct, all good at the moment but think it would be nicer and less error prone to return the absolute value.

(as discussed on Skype)

johnmay commented 1 year ago

Not sure if label is “bug” :)