CalebBell / thermo

Thermodynamics and Phase Equilibrium component of Chemical Engineering Design Library (ChEDL)
MIT License
594 stars 114 forks source link

Data issue: CAS# 16949-15-8 #28

Open ljn917 opened 4 years ago

ljn917 commented 4 years ago

Hi,

It looks like the data for CAS# 16949-15-8 is not correct. As this shows, CAS# 16949-15-8 is LiBH4, but I got the following output. It looks like the hydrogens are dropped incorrectly.

>>> a=thermo.chemical.Chemical('16949-15-8')
>>> a.smiles
'[Li+].[B-]'
>>> a.rho
537.840616966581

Thanks

alexchandel commented 4 years ago

Problem is line 68158 of chemical identifier.tsv. PubChem gave a mismatched formula, weight, & smiles. (Don't expect accuracy from the govt). The smiles should be [Li+].[BH4-].

alexchandel commented 4 years ago

It gets better. PubChem has 5 separate "compound" entries, all claiming to be lithium borohydride:

  1. https://pubchem.ncbi.nlm.nih.gov/compound/11996612 (non-existent)
  2. https://pubchem.ncbi.nlm.nih.gov/compound/20722760 (non-existent)
  3. https://pubchem.ncbi.nlm.nih.gov/compound/4148881
  4. https://pubchem.ncbi.nlm.nih.gov/compound/139038538 (high-P polymorph)
  5. https://pubchem.ncbi.nlm.nih.gov/compound/139046170 (high-P polymorph)

Wikipedia cites No.3, as does the ChemSpider entry with the same CAS number.

(There are also duplicated sodium aluminum hydride entries, one showing net charge and the other formal charge.)

@CalebBell I recommend deleting the CID# 20722760 row altogether, and adding a row for CID# 4148881 with the CAS# 16949-15-8.

Separately, given the number of errors & duplicates in PubChem, a chemical identifiers duplicate.tsv database should be created to alias the various duplicate CID's.