Closed connie closed 10 years ago
Any chance the log file was made before the switch to RDKit, and is being checked after?
Yes this was still on the main branch. Will try with rdkit branch then.
I think your problem is fixed by 87a7ce9c720a9b28784c895332e43564535de784, but I will leave the issue open for now as a reminder to implement more rigorous checking (or explicitly decide not to).
Still running into the same problem. The log file omits an H in the InChI. But I think it is the correct molecule. https://github.com/GreenGroup/RMG-Py/commit/87a7ce9c720a9b28784c895332e43564535de784 will still fail on this.
Warning: InChI in log file (InChI=1S/C10H18O5/c1-10(11,7-4-8-13-12)15-14-9-5-2-3-6-9/h4,7,9,11-12H,2-3,5-6,82,1H3) didn't match that in geometry (InChI=1S/C10H18O5/c1-10(11,7-4-8-13-12)15-14-9-5-2-3-6-9/h4,7,9,11-12H,2-3,5-6,8H2,1H3).
Traceback (most recent call last):
File "/files/RMG-Py/rmg.py", line 144, in
Is the log file really just missing that one character? Why on earth would it? It isn't interpreted by MOPAC/GAUSSIAN, just read in and spat out!
Either MOPAC/Gaussian is deleting a character (string too long? Split across lines? Utter bizarreness)...
Or the log file dates from a previous job when we generated InChIs without a missing H (via OB) and are comparing it with a newly created one (via RDKit).
Could it be the latter? Can you figure out a test?
...or it's something else :-D
I believe that MOPAC output file is giving the incorrect inchi- might be an internal MOPAC problem. The one from the geometry must be from RDKit(?) and it's the correct one- I checked using the website.
It's not an Openbabel problem because I just ran it fresh with the latest RMG.
In fact, the .mop file contains the correct inchi, but then the .out file contains the one missing the H!
Ok. What happens if you replace it with a string like 1234567890123456790... ? Does MOPAC delete the Nth character or is it something special about that ,8H2, ?
There is nothing special about ,8H2, MOPAC appears to be removing the 81th character exactly in the output log file.
Tried a .mop file with this input: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890thisisatestformopacwillitwork
Got this in the output (missing the 'c' in mopac): abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890thisisatestformopawillitwork
Perhaps we can allow matching for the first 80 characters if the log file inchi happens to be longer?
In any case, the checkForInChiKeyCollision function is missing in both Gaussian and Mopac classes. It appears in QMVerifier (but this class doesn't seem to be used anywhere- was it intended to be a parent class?).
Running into the following inchi mismatch error between log file and geometry for the second time now. Note that the only difference between the inchi's is the final 3 in the string. Seems like they are actually the same molecule. The geometry inchi is the accurate one, however.
Also we do not yet have methods for inchi key collisions..