Closed gmagoon closed 13 years ago
For v2000 mol file specification, it should be broken into a new token every 3 characters. See p 45 of http://www.symyx.com/downloads/public/ctfile/ctfile.pdf
The Counts Line aaabbblllfffcccsssxxxrrrpppiiimmmvvvvvv Where:: aaa = number of atoms (current max 255)* [Generic] bbb = number of bonds (current max 255)* [Generic] lll = number of atom lists (max 30)* [Query] fff = (obsolete) ccc = chiral flag: 0=not chiral, 1=chiral [Generic] sss = number of stext entries [ISIS/Desktop] xxx = (obsolete) rrr = (obsolete) ppp = (obsolete) iii = (obsolete) mmm = number of lines of additional properties, including the M END line. No longer supported, the default is set to 999.
Interesting...thanks Richard. So maybe InChI has (or at least had in v. 1.02beta) a bug with writing MOL files.
Actually, the example MOL file still seems to be in accord with the specification. When you mentioned new token every 3 characters, I misinterpreted that as meaning separated by whitespace.
Bug fix: Converting InChIs (or .mol files) to adjacency lists
If the number of bonds in a molecule exceeded 99, the InChI2AdjList (or Mol2AdjList) function would fail. This was due to the number of atoms and the number of bonds running together in the .mol file (as pointed out by GRM). As pointed out by RHW in the comments, this can be resolved by recognizing every token takes 3 characters.
MRH has implemented and tested the fix. The case presented by GRM in the issue runs w/o error.
Closed by d79f6cdeb3373e2b96e71f1e4dd884e391819df4
When a molecule has > 99 bonds, the number of atoms and bonds in the .mol file can run together: Consider the following C70 molecule with 105 bonds (InChI=1/C70/c1-2-22-5-6-24-13-14-26-11-9-23-4-3(21(1)51-52(22)54(24)55(26)53(23)51)33-31(1)61-35-7-8-27-15-16-29-19-20-30-18-17-28-12-10(25(7)56-57(27)59(29)60(30)58(28)56)37(35)63(33)65-36(4)40(9)67(44(17)42(12)65)69-46(11)47(14)70(50(20)49(18)69)68-43(13)39(6)66(45(16)48(19)68)64-34(5)32(2)62(61)38(8)41(15)64):
This causes the following error, as the fourth line is parsed with whitespace delineation: