ReactionMechanismGenerator / RMG-Java

The Java version of RMG: Reaction Mechanism Generator
http://rmg.sourceforge.net/
MIT License
29 stars 36 forks source link

Apparent InChIKey collision. #201

Open rwest opened 12 years ago

rwest commented 12 years ago

From @rajeshdparmar

Created new species: C15H28O5(7621) Created new forwards 1,3_Insertion_CO2 reaction: CO2(28) + C14H28O3(7186) --> C15H28O5(7621) CRITICAL: Congratulations! You appear to have discovered the first recorded instance of an InChIKey collision: InChIKey(augmented) = MBIZFQYMQAKCRY-WYUMXYHSCN RMG Augmented InChI = InChI=1/C15H28O5/c1-3-4-5-6-7-8-9-12-10-13(20-18)14(15(16)17)11(2)19-12/h11-14,18H,3-10H2,1-2H3,(H,16,17)/f/h16H MOPAC input file Augmented InChI = InChI=1/C15H28O5/c1-11(15(16)17)7-5-3-4-6-8-13-10-14(20-18)9-12(2)19-13/h11-14,18H,3-10H2,1-2H3,(H,16,17)/f/h16H Log file Augmented InChI = InChI=1/C15H28O5/c1-11(15(16)17)7-5-3-4-6-8-13-10-14(20-18)9-12(2)19-13/h11-14,1H,3-10H2,1-2H3,(H,16,17)/f/h16H

Results in rajesh/Rajesh/New_jobs_after_Aug_10/MultiT_PM3_Prun_15000_single_pressure_1_3_1$ - InchI collision

This is similar to Issue #184

gmagoon commented 12 years ago

Is it possible that this is related to the "make clean" issues that have been discussed recently elsewhere? "make clean" will probably remove the c-InChI-1 executable file from the bin directory (it is copied into there at the start of the job by the rmgqm.profile script which is called by the job submission script).

rwest commented 12 years ago

You're right that wiping the c-InChI-1 and the SYMMETRY programs does kill the job in a related way (I did this myself) but I think you then end up with errors about missing files in your log file.

However:

1) @rajeshdparmar reported this before we did that make clean. 2) there is no evidence of errors running the inchi program in the log files (edit:) 3) if the problem is running the inchi program, the error message should say so :-D

gmagoon commented 12 years ago

It seems that the InChI/InChIKey that RMG has in memory are in conflict. According to ChemBioDraw, the InChI=1/C15H28O5/c1-3-4-5-6-7-8-9-12-10-13(20-18)14(15(16)17)11(2)19-12/h11-14,18H,3-10H2,1-2H3,(H,16,17)/f/h16H should have an InChIKey starting with LWEZLYNYEKOPBE. On the other hand, the InChI/InChIKey combination in the log file is likely OK, with InChI=1/C15H28O5/c1-11(15(16)17)7-5-3-4-6-8-13-10-14(20-18)9-12(2)19-13/h11-14,18H,3-10H2,1-2H3,(H,16,17)/f/h16H and InChIKey starting with MBIZFQYMQAKCRY

gmagoon commented 12 years ago

Is it possible that @rajeshdparmar accidentally had multiple jobs simultaneously running in the same directory?

gmagoon commented 12 years ago

Also, perhaps we could take a look at the contents of the InChI folder in @rajeshdparmar 's working directory?

rwest commented 12 years ago

I think he was using my script which copies everything to a temporary folder ($TMPDIR - set by the grid engine) on the scratch drive of the compute node and runs it there. It is therefore infeasible that they would be in the same folder. The $RMG folder (and $RMG/bin/ etc.) will be the same for all jobs, but the working directories should be separate. The results, when done, were copied back to ~/Rajesh/New_jobs_after_Aug_10/MultiT_PM3_Prun_15000_single_pressure_1_3_1

gmagoon commented 12 years ago

OK, the species .txt file seems to have everything correctly:

* Input_File: "species.mol"
Structure: 1
InChI=1/C15H28O5/c1-3-4-5-6-7-8-9-12-10-13(20-18)14(15(16)17)11(2)19-12/h11-14,18H,3-10H2,1-2H3,(H,16,17)/f/h16H
AuxInfo=1/1/N:5,4,7,9,11,13,15,17,16,12,6,14,10,8,1,2,3,48,46,47/E:(16,17)/F:m/rA:48nCOOCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHO$
InChIKey=LWEZLYNYEKOPBE-WYUMXYHSCN
gmagoon commented 12 years ago

PS...is this $TMPDIR the same for all jobs, or would two jobs running on the same compute node have different $TMPDIR?

rwest commented 12 years ago

$TMPDIR is unique to the job. For my job number 79895 currently running in the long queue, it is /tmp/79895.1.long1

gmagoon commented 12 years ago

I'm not sure exactly what went wrong here, but it seems that when generating the InChIKey, the InChI process somehow failed (though apparently not in the same way as in issue #184, as the failure does not seem to have been caught by the Java code) and the InChIKey associated with the InChI output from the previous molecule was read in. When the process was rerun for generating the InChI, however, everything apparently worked fine, and the correct InChI was read in. (Note that the InChIKey and InChI are generated in two separate InChI runs. Aside from performance considerations, however, this isn't the root issue here.) If someone can come up with a better explanation of what went wrong here, I'm like to hear your thoughts.

So what I will do is modify the InChI generation code to remove the old species.mol and species.txt files before rewriting them. I will also have RMG print out the ChemGraph when failing in this manner, which may prove to make future debugging a little easier.

(UPDATE: The second debugging line is not as easy as I had expected, but as it is not critical, and I don't think it is relevant here, I'm just going to focus on the first.)

By the way, did this occur around the time that @rajeshdparmar submitted a new job, as was the case with issue #184?