ReactionMechanismGenerator / RMG-Java

The Java version of RMG: Reaction Mechanism Generator
http://rmg.sourceforge.net/
MIT License
29 stars 36 forks source link

mol2AdjList (and hence InChI2AdjList) fails for molecules with > 99 bonds #93

Closed gmagoon closed 13 years ago

gmagoon commented 14 years ago

When a molecule has > 99 bonds, the number of atoms and bonds in the .mol file can run together: Consider the following C70 molecule with 105 bonds (InChI=1/C70/c1-2-22-5-6-24-13-14-26-11-9-23-4-3(21(1)51-52(22)54(24)55(26)53(23)51)33-31(1)61-35-7-8-27-15-16-29-19-20-30-18-17-28-12-10(25(7)56-57(27)59(29)60(30)58(28)56)37(35)63(33)65-36(4)40(9)67(44(17)42(12)65)69-46(11)47(14)70(50(20)49(18)69)68-43(13)39(6)66(45(16)48(19)68)64-34(5)32(2)62(61)38(8)41(15)64):

Structure #1
  InChI v1 SDfile Output                       

 70105  0  0  0  0  0  0  0  0  1 V2000
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0     0  0  0  0  0  0
  1  2  2  0  0  0  0
  1 21  1  0  0  0  0
  1 31  1  0  0  0  0
  2 22  1  0  0  0  0
  2 32  1  0  0  0  0
  3  4  2  0  0  0  0
  3 21  1  0  0  0  0
  3 33  1  0  0  0  0
  4 23  1  0  0  0  0
  4 36  1  0  0  0  0
  5  6  2  0  0  0  0
  5 22  1  0  0  0  0
  5 34  1  0  0  0  0
  6 24  1  0  0  0  0
  6 39  1  0  0  0  0
  7  8  2  0  0  0  0
  7 25  1  0  0  0  0
  7 35  1  0  0  0  0
  8 27  1  0  0  0  0
  8 38  1  0  0  0  0
  9 11  2  0  0  0  0
  9 23  1  0  0  0  0
  9 40  1  0  0  0  0
 10 12  2  0  0  0  0
 10 25  1  0  0  0  0
 10 37  1  0  0  0  0
 11 26  1  0  0  0  0
 11 46  1  0  0  0  0
 12 28  1  0  0  0  0
 12 42  1  0  0  0  0
 13 14  2  0  0  0  0
 13 24  1  0  0  0  0
 13 43  1  0  0  0  0
 14 26  1  0  0  0  0
 14 47  1  0  0  0  0
 15 16  2  0  0  0  0
 15 27  1  0  0  0  0
 15 41  1  0  0  0  0
 16 29  1  0  0  0  0
 16 45  1  0  0  0  0
 17 18  2  0  0  0  0
 17 28  1  0  0  0  0
 17 44  1  0  0  0  0
 18 30  1  0  0  0  0
 18 49  1  0  0  0  0
 19 20  2  0  0  0  0
 19 29  1  0  0  0  0
 19 48  1  0  0  0  0
 20 30  1  0  0  0  0
 20 50  1  0  0  0  0
 21 51  2  0  0  0  0
 22 52  2  0  0  0  0
 23 53  2  0  0  0  0
 24 54  2  0  0  0  0
 25 56  2  0  0  0  0
 26 55  2  0  0  0  0
 27 57  2  0  0  0  0
 28 58  2  0  0  0  0
 29 59  2  0  0  0  0
 30 60  2  0  0  0  0
 31 33  2  0  0  0  0
 31 61  1  0  0  0  0
 32 34  2  0  0  0  0
 32 62  1  0  0  0  0
 33 63  1  0  0  0  0
 34 64  1  0  0  0  0
 35 37  2  0  0  0  0
 35 61  1  0  0  0  0
 36 40  2  0  0  0  0
 36 65  1  0  0  0  0
 37 63  1  0  0  0  0
 38 41  2  0  0  0  0
 38 62  1  0  0  0  0
 39 43  2  0  0  0  0
 39 66  1  0  0  0  0
 40 67  1  0  0  0  0
 41 64  1  0  0  0  0
 42 44  2  0  0  0  0
 42 65  1  0  0  0  0
 43 68  1  0  0  0  0
 44 67  1  0  0  0  0
 45 48  2  0  0  0  0
 45 66  1  0  0  0  0
 46 47  2  0  0  0  0
 46 69  1  0  0  0  0
 47 70  1  0  0  0  0
 48 68  1  0  0  0  0
 49 50  2  0  0  0  0
 49 69  1  0  0  0  0
 50 70  1  0  0  0  0
 51 52  1  0  0  0  0
 51 53  1  0  0  0  0
 52 54  1  0  0  0  0
 53 55  1  0  0  0  0
 54 55  1  0  0  0  0
 56 57  1  0  0  0  0
 56 58  1  0  0  0  0
 57 59  1  0  0  0  0
 58 60  1  0  0  0  0
 59 60  1  0  0  0  0
 61 62  2  0  0  0  0
 63 65  2  0  0  0  0
 64 66  2  0  0  0  0
 67 69  2  0  0  0  0
 68 70  2  0  0  0  0
M  END
$$$$

This causes the following error, as the fourth line is parsed with whitespace delineation:

Exception in thread "main" java.lang.NumberFormatException: For input string: "V2000"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
        at java.lang.Integer.parseInt(Integer.java:447)
        at java.lang.Integer.parseInt(Integer.java:497)
        at jing.chem.Species.mol2AdjList(Species.java:1692)
        at jing.chem.Species.inchi2AdjList(Species.java:1477)
        at inchiDictionaryReader.main(inchiDictionaryReader.java:63)
rwest commented 14 years ago

For v2000 mol file specification, it should be broken into a new token every 3 characters. See p 45 of http://www.symyx.com/downloads/public/ctfile/ctfile.pdf

The Counts Line aaabbblllfffcccsssxxxrrrpppiiimmmvvvvvv Where:: aaa = number of atoms (current max 255)* [Generic] bbb = number of bonds (current max 255)* [Generic] lll = number of atom lists (max 30)* [Query] fff = (obsolete) ccc = chiral flag: 0=not chiral, 1=chiral [Generic] sss = number of stext entries [ISIS/Desktop] xxx = (obsolete) rrr = (obsolete) ppp = (obsolete) iii = (obsolete) mmm = number of lines of additional properties, including the M END line. No longer supported, the default is set to 999.

gmagoon commented 14 years ago

Interesting...thanks Richard. So maybe InChI has (or at least had in v. 1.02beta) a bug with writing MOL files.

gmagoon commented 14 years ago

Actually, the example MOL file still seems to be in accord with the specification. When you mentioned new token every 3 characters, I misinterpreted that as meaning separated by whitespace.

mrharper commented 13 years ago

Bug fix: Converting InChIs (or .mol files) to adjacency lists

If the number of bonds in a molecule exceeded 99, the InChI2AdjList (or Mol2AdjList) function would fail. This was due to the number of atoms and the number of bonds running together in the .mol file (as pointed out by GRM). As pointed out by RHW in the comments, this can be resolved by recognizing every token takes 3 characters.

MRH has implemented and tested the fix. The case presented by GRM in the issue runs w/o error.

Closed by d79f6cdeb3373e2b96e71f1e4dd884e391819df4