Closed PeterKraus closed 2 years ago
Hi Peter, I always recommend strongly against searching by formula because formulas are NOT unique. If you really want to find a chemical by formula, you can do so as follows:
import chemicals
chemicals.identifiers.pubchem_db.search_formula('CO', autoload=True)
You will have to deal with potentially having a different chemical returned in the future though, although I'm not aware of another chemical with that formula in this case.
For example, C12H26 has 347 compounds in thermo. Sincerely, Caleb
I completely understand the issues for higher alkanes etc. which have many isomers, and indeed there's no reasonable way of defining a cut-off.
I think it's worth considering whether the current priority of "formula after smiles" is a good one in general, especially given that one can already search by smiles explicitly using the smiles=
prefix.
In my particular case (catalysis), nobody is going to call CO by the full name anywhere in their data tables. chemicals
does a great job at disambiguating stuff like C3H6
, propylene
, propene
to ensure it's the same molecule (smiles=CCC
); however I'd argue that CO
and methanol
/ MeOH
should never both evaluate to smiles=CO
as it's super unexpected (although technically correct).
Cheers!
What is the search string
Which chemical in the database do you believe should be found?
Perhaps a toggle to prefer searching by formulas over smiles first should be added?