IUPAC-InChI / RInChI

Repository of the IUPAC - RInChI group
3 stars 2 forks source link

support for RXN V3000 #15

Open uli-f opened 1 year ago

uli-f commented 1 year ago

My understanding is that there is support in place for MDL MOL files V3000.

However, I had a look at the repo and had a go with the rinchi 1.00 executable on windows x86_64 and there does not seem to be support for the RXN V3000 files.

With support for MDL MOL V3000 already in place it seems like most of the work is done to get RXN V3000 supported. Given the restrictions of CTAB V2000 more and more applications support V3000.

Do you have any plans for supporting RXN V3000?

janholstjensen commented 1 year ago

Correct, RInChI does not yet support V3000 RXN files.

The V3000 RXN file format does not support additional features beyond what the V2000 RXN file format does as far as I can read from the 2020 CTFile specification.

RInChI supports an agent count in the RXN file so it can hold both reactants, products, and agents. This is an extension that ChemAxon added and RInChI has adopted. I don't yet see that extension described in the V3000 and V2000 RXN file format specification from Biovia.

Anyway, a V3000 RXN file can be transformed into a V2000 RXN file, since RInChI will happily read V3000 molfiles and these can be embedded in a V2000 RXN file. I have converted your example from the mentioned issue to show how. I know this doesn't help with your immediate issue, but it shows that V3000 molfiles can be embedded in a V2000 RXN file.

Export_V3000RXN_converted_to_V2000RXN.zip

X:\Temp\RInChI>rinchi_cmdline.exe Export_V3000RXN_converted_to_V2000RXN.rxn
RInChI=0.03.1S/2Cu.O!C7H13NO2/c1-5-2-3-6(4-8-5)7(9)10/h5-6,8H,2-4H2,1H3,(H,9,10)/t5-,6+/m0/s1<>C7H15NO/c1-6-2-3-7(5-9)4-8-6/h6-9H,2-5H2,1H3/t6-,7+/m0/s1!Cu.O!Cu.O/d-
RAuxInfo=0.03.1/0/N:1;3;2/rA:3nCuOCu/rB:s1;s2;/rC:11.5583,-4.4625,0;12.2728,-4.05,0;12.9873,-4.4625,0;!1/N:10,2,1,5,3,6,7,4,8,9/E:(9,10)/it:im/rA:10nCCCNCCCOOC/rB:s1;s2;s3;s4;s1s5;P6;s7;d7;P3;/rC:7.9542,-4.325,0;7.9542,-5.15,0;8.6662,-5.5583,0;9.3782,-5.15,0;9.3782,-4.325,0;8.6662,-3.9083,0;8.6662,-3.0833,0;9.3807,-2.6708,0;7.9517,-2.6708,0;8.6662,-6.3833,0;<>0/N:9,2,1,5,7,3,6,4,8/it:im/rA:9nCCCNCCCOC/rB:s1;s2;s3;s4;s1s5;P6;s7;P3;/rC:.7917,-3.9167,0;.7917,-4.7417,0;1.5037,-5.15,0;2.2157,-4.7417,0;2.2157,-3.9167,0;1.5037,-3.5,0;1.5037,-2.675,0;2.2182,-2.2625,0;1.5037,-5.975,0;!0/N:1;2/rA:2nCuO/rB:d1;/rC:-4.0625,-4.275,0;-3.348,-3.8625,0;!0/N:1;2/rA:2nCuO/rB:d1;/rC:-1.8125,-4.3667,0;-1.098,-3.9542,0;
Long-RInChIKey=SA-BUHFF-BERDEBHAJNAUOM-UHFFFAOYSA-N-ITWDDDADSFZADI-NTSWFWBYSA-N--FMYGWFUUVQICDN-NKWVEPMBSA-N-QPLDLSVMHZLSFG-UHFFFAOYSA-N-QPLDLSVMHZLSFG-UHFFFAOYSA-N
Short-RInChIKey=SA-BUHFF-BDZAXYLNVN-POYSZOQTPF-UHFFFADPSC-NKLFV-NCOLW-NUHFF-ZZZ
Web-RInChIKey=OLAQHBHKGGKPQQHBN-NGZAZMLLJSPJISA

X:\Temp\RInChI>
janholstjensen commented 1 year ago

As for a timeframe for supporting V3000 RXN files - I can't give any estimate at present.

uli-f commented 1 year ago

The V3000 RXN file format does not support additional features beyond what the V2000 RXN file format does as far as I can read from the 2020 CTFile specification.

My understanding is that the RXN format is just a very thin wrapper around CTABs/molfiles of the individual reaction components. The difference between RXN V2000 and RXN V3000 is that the individual reaction components of RXN V2000 are in V2000 molfile format and the individual components of RXN V3000 are in the V3000 CTAB format. The 2020 CTFile spec by Biovia shows this on page 36 and page 66.

RInChI supports an agent count in the RXN file so it can hold both reactants, products, and agents. This is an extension that ChemAxon added and RInChI has adopted. I don't yet see that extension described in the V3000 and V2000 RXN file format specification from Biovia.

Yes, that is an unofficial extension introduced by ChemAxon that is widely spread though.

Anyway, a V3000 RXN file can be transformed into a V2000 RXN file, since RInChI will happily read V3000 molfiles and these can be embedded in a V2000 RXN file.

That is certainly interesting that the RInChI reader will happily digest this, but to the best of my knowledge it is not standard compliant and cannot be written or read by most other programs I am aware of.

It demonstrates, however, how little work would be required to be able to process RXN V3000 files with RInChI 😃

I have converted your example from the mentioned issue to show how. I know this doesn't help with your immediate issue, but it shows that V3000 molfiles can be embedded in a V2000 RXN file.

I don't think that the file you attached to your message is compliant with the standard as it declares to be RXN V2000 and then contains the individual reaction components as V2000 molfiles. The programs I frequently use would neither be able to read nor write a file like this.

janholstjensen commented 1 year ago

The difference between RXN V2000 and RXN V3000 is that the individual reaction components of RXN V2000 are in V2000 molfile format and the individual components of RXN V3000 are in the V3000 CTAB format. The 2020 CTFile spec by Biovia shows this on page 36 and page 66.

It shows it by example, but I can't see anywhere that it specifically states that all components should be in a particular molfile format. Therefore, I believe that putting V3000 molfiles into a V2000 RXN file adheres to spec (when I am in pedantic mode :wink:). Anyway, if most software interprets the specification as "V2000 RXN file contains V2000 molfiles only", then I will accept that as how it should work.

And yes, I agree that writing an additional parser and writer for V3000 RXN files should not be too much work.

uli-f commented 1 year ago

It shows it by example, but I can't see anywhere that it specifically states that all components should be in a particular molfile format. Therefore, I believe that putting V3000 molfiles into a V2000 RXN file adheres to spec (when I am in pedantic mode 😉). Anyway, if most software interprets the specification as "V2000 RXN file contains V2000 molfiles only", then I will accept that as how it should work.

I totally agree with you that the CTFile specification document by Biovia does not even come close to the precision and accuracy that would be desirable from a specification document.

However, I haven't come across a software who interprets the difference between RXN V2000 and RXN V3000 in a more lenient way than I laid out above, so I would consider this the interpretation the community generally goes with.

There are restrictions to V2000 CTAB so it seems to me that the use of V3000 CTAB has increased over the last few years. So it is good to hear that writing a parser and writer for V3000 RXN files should not be too much work 😃