Open DunklesArchipel opened 3 years ago
Based on the changes in echemdb/svgdigitizer#51, the content in the yaml file should be reconsidered.
The part on figure description
now requires fewer keys since most values are directly extracted from the svg file.
We should discuss once more yaml. structure: We have now for electrolyte: {name, sum formula, ...}: Question why don't we use the same name doubling in electrode materials? Name doublings can be problematic?! E.g. EtOH as in yaml is not a standardized sum formula: I think we should have only one classifyer, e.g. name and then sumformula is created from lookup table or the other way round. Also "sodium hydroxide" instead of "NaCl" e.g. on the website is not so cool. In short, I think we should homogenize names to anything, then have a function that creates the "common name": This "common name" should be what is displayed on the website. This will be "NaCl" and "Ethanole" in the discussed case, and not either name or sum formula
EtOH is indeed some sort of trivia name. Besides, I agree that a single identifier is sufficient.
In principle, the name can be either a full name, a sum formula, or a trivia name. This could be checked against a list with dicts:
chemicals = [{'name' : 'ethanol',
'sum formula' : ' C2H6O',
'display name': 'Ethanol',
'alternative names': ['ethanol', 'EtOH', 'C2H6O', 'CH3CH2OH', 'C2H5OH']},
{'name' : 'sulfuric acid',
'sum formula' : 'H2SO4',
'display name': 'H$_2$SO$_4$',
'alternative names': ['sulfuric acid', 'H2SO4']}]
We could also try to implement existing packages for chemical formulas:
Yes, I think we should do this to go the safe way: After some looking around I found this solution, which queries synonym names of compounds, and then assigns a unique cid (chemical compound id), see code below!
We could also query then Inchi https://iupac.org/who-we-are/divisions/division-details/inchi/
What is cool you can write EtOH or ethanol, and it removes the necessity to build the above dict. Or we build the above dict from the code pasted below: Then we can translate easily into chemical formulas names/ display names etc, and it will be an automatic test that people have written actual chemical compounds into the yaml:
## Sanitize names
import pubchempy as pcp
from pubchempy import Compound
from pymatgen.core.composition import Composition
cids = pcp.get_cids('NaCl', 'name')
cid = cids[0]
s0 = Compound.from_cid(cid)
print(s0.synonyms[0], s0.molecular_formula)
cids = pcp.get_cids('sodium chloride', 'name')
cid = cids[0]
s1 = Compound.from_cid(cid)
print(s1.synonyms[0], s1.molecular_formula)
cids = pcp.get_cids('Sodium Chloride', 'name')
cid = cids[0]
s2 = Compound.from_cid(cid)
print(s2.synonyms[0], s2.molecular_formula)
cc = Composition(s2.molecular_formula)
print(s2.synonyms[0], cc.reduced_formula)
cids = pcp.get_cids('ethanol', 'name')
cid = cids[0]
s = Compound.from_cid(cid)
print(s.synonyms[0], s.molecular_formula, Composition(s.molecular_formula).reduced_formula)
cids2 = pcp.get_cids('EtOH', 'name')
cid2 = cids2[0]
s2 = Compound.from_cid(cid2)
print(s2.synonyms[0],s2.molecular_formula, Composition(s2.molecular_formula).reduced_formula)
Great stuff. It finds random typos.
Add a section for used gases in the "electrochemical system" section
Points to discuss:
electrochemical system
Original top level message: The yaml files for the test cases, e.g.,
xy.yaml
, are still in the old format. These should be adapted to work properly with the future CV module.