Generating free radical species from stable species formatted in SMILES and estimating thermodynamic properties using GA method

leosfan commented 7 years ago

Hi, Is it possible to use the RMG-Py for the automatic generation of free radicals from stable species documented in SMILES format? Since the RMG-Py can automatically generate reaction mechanism from a initial pool of core species and reactions, I wonder if this functionality was implemented.

In addition, I would like to know how to use the thermodynamic estimation (Group-Additivity based method) for any imported SMILES. It seems that group.py requires the SMILES format to be converted to adjacency list, but I don't know if this step is automatic or how many species can group.py recognize without additional seeding.

Thanks a lot for the help!

mjohnson541 commented 7 years ago

I would point you to the our website: rmg.mit.edu. Our populate reactions tool takes an RMG input file and generates a list of all reactions that can occur between the initial species in HTML and chemkin format, which should include a list of all of the species, although this will include all species not just free radical species. The HTML should have SMILES for species it is specific enough to describe. We also have a script called generateReactions.py in RMG-Py/scripts that I believe mostly does the same thing.

You can also use our molecule search tool on the same website to convert your smiles to an adjacency list and if you click on search thermochemistry it will return the group additivity values along with other sources.
We appear to have a script called thermoEstimator.py in RMG-Py/scripts, it refers to a "thermo input file" I've never heard of but based on skimming the code I think it may be the same as an rmg input file just without the simulator and model calls.

leosfan commented 7 years ago

Hi @mjohnson541 , Thank you very much for your help! I'm aware of the website function. However, I have a list of species as large as 4000, and I wonder if I can process the conversion automatically using rmg.py together with group.py and thermoEstimator.py. Thanks again.

rwest commented 7 years ago

It is not built in, but the modular nature of RMG-Py means it would not be hard to write such a script. By generating radicals do you mean just breaking R-H bonds (removing H atoms) or breaking all bonds?

leosfan commented 7 years ago

Hi @rwest , Thanks for the reply. H-abstraction is just one type of radical-generating reactions. However, I don't mind having other types of species beside free radicals as long as the species are complete. Yet I do worry if they can be converted back to SMILES once generated.

mjohnson541 commented 7 years ago

They can all be converted back to smiles, molecule objects have a bound function toSMILES() that converts molecule objects into a SMILES string.

keceli commented 7 years ago

You can check this commit https://github.com/keceli/RMG-Py/commit/cd5b650a24ba4ea3b73e5843823b945db59c5e9e as an example for printing a smiles list as RMG output.

On Sun, Jun 11, 2017 at 12:12 PM, Matt Johnson notifications@github.com wrote:

They can all be converted back to smiles, molecule objects have a bound function toSMILES() that converts molecule objects into a SMILES string.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ReactionMechanismGenerator/RMG-Py/issues/1045#issuecomment-307642941, or mute the thread https://github.com/notifications/unsubscribe-auth/ACA7d5JyC4N45h6Q-rJjzo36GiNujh8vks5sDCAagaJpZM4N2X95 .

leosfan commented 7 years ago

@keceli Thanks a lot for the help. I tried to install RMG-Py using Ubuntu 14 via Anaconda2 (previously installed Anaconda3), there are only 47% of the total function passed the test (database test is 100% ok). I wonder if this will affect the functions discussed above. If so, I can afford to try install it in VirtualBox under Ubuntu 12 using Anaconda2.

My final backup plan is to use javascript to automatically obtain GA values for each SMILES species from rmg.mit.edu.

mliu49 commented 7 years ago

There are a few scripts which might be helpful, located in our work-in-progress repository here. Specifically, smilesToDictionary.py and dictionaryToSMILES.py can be used to convert between a list of SMILES and RMG-style species dictionaries with adjacency lists. These could potentially be used with inputFromDictionary.py to add your species to a generate thermo or generate reactions input file.

For the 47% value, are you sure that's the passing percentage? I think our test coverage is currently around that value, which indicates what percentage of the code is being tested. Normally, the test suite should directly report the number of errors or failures rather than a percentage.

leosfan commented 7 years ago

@mliu49 Thanks for pointing this out!

I'm using PyCharm for scripting, and actually thought that I can use Python2.7 in rmg_env created from Anaconda as my default interpreter. However, this does not work. I can run rmg.py in terminal, but not within IDE. Do you think I should also install the packages on Anaconda Cloud (https://anaconda.org/rmg/rmg) ?

Here is the actual "make test" result: Name Stmts Miss Branch BrPart Cover TOTAL 18325 9110 7985 787 47%

Ran 1230 tests in 233.679s

FAILED (SKIP=35, errors=3, failures=1) Makefile:81: recipe for target 'test' failed make: *** [test] Error 1

rwest commented 7 years ago

Just a note that it is possible to run RMG-Py using an rmg_env Anaconda environment within PyCharm. This is how I work (also with Eclipse/PyDev)

It seemed easy enough on my Mac, but took a bit more massaging on Windows: This is because to get RDKit to work properly, you need to actually run the activate script, not just use a python interpreter from within an Anaconda environment. On Windows the easiest way to do this is to find the shortcut to PyCharm in your Windows Start Menu folder, create a copy of it called something like "PyCharm - rmg_env" and edit the target so that instead of C:\Users\yourname\AppData\Local\JetBrains\....\pycharm64.exe it points to cmd "/c activate rmg_env && C:\Users\yourname\AppData\Local\JetBrains\....\pycharm64.exe" (the precise path to the pycharm64 will depend on your installation).

Regarding the errors running make test, can you scroll up a bit and find which of the 1230 tests actually gave the 3 errors and 1 failure and what the messages were?

leosfan commented 7 years ago

@rwest Thanks again for the help. I'm using Linux Ubuntu 14 (can switch to VirtualBox Ubuntu 12 if necessary) so I guess I can put "source activate rmg_env" then followed by "run bin/pycharm.sh" ?

The error messages are here:

[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
ERROR: Test situation where both registration_table and results_table have no
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Traceback (most recent call last):
  File "/home/fans/RMG-Py/rmgpy/data/thermoTest.py", line 1003, in testRegisterInCentralThermoDB1
    db =  getattr(tcdi.client, 'thermoCentralDB')
AttributeError: 'NoneType' object has no attribute 'thermoCentralDB'
[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
ERROR: Test situation where registration_table has species as the one going
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Traceback (most recent call last):
  File "/home/fans/RMG-Py/rmgpy/data/thermoTest.py", line 1043, in testRegisterInCentralThermoDB2
    db =  getattr(tcdi.client, 'thermoCentralDB')
AttributeError: 'NoneType' object has no attribute 'thermoCentralDB'

[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
ERROR: Test situation where results_table has species as the one going
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Traceback (most recent call last):
  File "/home/fans/RMG-Py/rmgpy/data/thermoTest.py", line 1082, in testRegisterInCentralThermoDB3
    db =  getattr(tcdi.client, 'thermoCentralDB')
AttributeError: 'NoneType' object has no attribute 'thermoCentralDB'

[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[
FAIL: testConnectSuccess (rmgpy.data.thermoTest.TestThermoCentralDatabaseInterface)
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Traceback (most recent call last):
  File "/home/fans/RMG-Py/rmgpy/data/thermoTest.py", line 900, in testConnectSuccess
    self.assertTrue(tcdi.client is not None)
AssertionError: False is not true

rwest commented 7 years ago

This is now straying far from the original title of this issue (so perhaps should be discussed in a separate issue) but @Kehang is this ThermoCentralDatabaseInterface error related to #1030 and #1035, and do you have any advice to make the make test work out of the box?

leosfan commented 7 years ago

Just a follow up on the issue of calculating GA values using just SMILES. Transforming SMILES to Adjacency list is easy enough using Molecule() function. However, ThermoEstimator.py in RMG-Py/scripts requires a "thermo input file" as @mjohnson541 mentioned. I would like to know more about the format of such input file. Can I specifically just use list of SMILES?

By looking at the script, it seems the GA calculations should be easy enough by just using species.getThermoData(). Yet, can I just use "species = Species().fromSMILES(smiles_string)" for the same species?

mliu49 commented 7 years ago

There are sample thermo input files in the examples folder (one example). It is essentially as subset of the components needed for a standard RMG input file. Unfortunately, a list of SMILES would not be accepted by thermoEstimator.py.

While species.getThermoData() is used to generate thermo data for a species, it requires that the RMGDatabase and ThermoDatabase objects be properly set up. These steps are a large part of what thermoEstimator.py does.

If you have a very large list of species, it might be worthwhile to put together a custom script by taking parts from existing scripts. You can probably combine parts of smilesToDictionary.py and thermoEstimator.py directly process a list of SMILES instead of using a thermo input file. Let us know if you would like more details for putting together a custom script.

leosfan commented 7 years ago

Thanks for the help! @mliu49 My immediate thought was to create numerical labels for all my SMILES species in order to put up a input file. However, I'm particularly interested in calculating the GA values instead of Quantum Mechanics. This examples seems using PM3 method, which is very time consuming. In addition, will the thermo values heavily influenced by choosing "primaryThermoLibrary"? Thanks again.

mliu49 commented 7 years ago

You can leave out the quantum mechanics block of the input file if you do not want QM estimates. The thermo libraries are also optional; you can leave it as an empty list if you do not want to use them. The primaryThermoLibrary is generally recommended since it contains a some species which group additivity cannot estimate or estimates inaccurately. You can look at the library here to see if you want to keep it.

KEHANG commented 7 years ago

@rwest the errors are caused by lack of authentication info (currently only available to developers) which is needed to access to the central thermo database. @leosfan you can ignore those 3 errors and 1 fail now, meaning all the other tests actually passed. If you really want to see all the tests pass, you could potential push your own branch to github and trigger travis build test for you, which is already configured with authentication.

leosfan commented 7 years ago

[In addition, I think creating a text file containing the same format as the input.py will do the job (I can copy the content to a IDE, and the format will be retained). Will this be a correct way to create input.py for 4000 SMILES species?]-->This part actually worked, and I managed to obtain GA values using species.getThermoData().toThermoData(). However, the values I get is fairly different from the ones in RMG online thermo Library. Is this expected (primary library is GRI_3.0)?

Among the groups I obtained for certain SMILES, there are many "other group R". Are these groups arbitrary from one molecule to another?

Moreover, I wish to know how to convert NASA polynomial coefficients back to thermo values. Can I just use the same function mentioned above?

Thanks again.

JacksonBurns commented 1 year ago

Linux VM on Windows is no longer supported, closing as stale.

ReactionMechanismGenerator / RMG-Py

Generating free radical species from stable species formatted in SMILES and estimating thermodynamic properties using GA method #1045