ebi-chebi / ChEBI

Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.
https://www.ebi.ac.uk/chebi
Creative Commons Attribution 4.0 International
39 stars 10 forks source link

Atom missing metadata #4491

Open cmungall opened 3 months ago

cmungall commented 3 months ago

I am writing rules that classify different CHEBI classes into chemrof metaclasses. I am managing to classify different CHEBI classes into Isotopes or Elements, but the following are outliers:

─────────────┬──────────────────┬───────────────┬───────────────┬──────────────────────┬───────────────────────────┬─────────────────────────┬────────┬──────────┬───────────────────┐
│     id      │       name       │ chebi_formula │ smiles_string │     inchi_string     │     inchi_components      │ inchi_components_length │ charge │   mass   │ monoisotopic_mass │
│   varchar   │     varchar      │    varchar    │    varchar    │       varchar        │         varchar[]         │          int64          │ int32  │  float   │       float       │
├─────────────┼──────────────────┼───────────────┼───────────────┼──────────────────────┼───────────────────────────┼─────────────────────────┼────────┼──────────┼───────────────────┤
│ CHEBI:33396 │ nobelium         │ No            │ [No]          │ InChI=1S/No          │ [InChI=1S, No]            │                       2 │      0 │    259.0 │             259.0 │
│ CHEBI:33363 │ palladium        │ Pd            │ [Pd]          │ InChI=1S/Pd          │ [InChI=1S, Pd]            │                       2 │      0 │   106.42 │         105.90348 │
│ CHEBI:36938 │ nitrogen-14 atom │ N             │               │                      │                           │                         │      0 │   14.007 │          14.00307 │
│ CHEBI:36934 │ nitrogen-15 atom │ N             │               │                      │                           │                         │      0 │   14.007 │          14.00307 │
│ CHEBI:33379 │ erbium           │ Er            │ [Er]          │ InChI=1S/Er          │ [InChI=1S, Er]            │                       2 │      0 │   167.26 │          165.9303 │
│ CHEBI:33381 │ ytterbium        │ Yb            │ [Yb]          │ InChI=1S/Yb          │ [InChI=1S, Yb]            │                       2 │      0 │   173.04 │         173.93887 │
│ CHEBI:25555 │ nitrogen atom    │ N             │               │                      │                           │                         │      0 │   14.007 │          14.00307 │
│ CHEBI:33367 │ darmstadtium     │ Ds            │ [Ds]          │ InChI=1S/Ds          │ [InChI=1S, Ds]            │                       2 │      0 │      0.0 │               0.0 │
│ CHEBI:36936 │ nitrogen-16 atom │ N             │               │                      │                           │                         │      0 │   14.007 │          14.00307 │
│ CHEBI:33361 │ meitnerium atom  │ Mt            │ [Mt]          │                      │                           │                         │      0 │      0.0 │               0.0 │
│ CHEBI:27998 │ tungsten         │ W             │ [W]           │ InChI=1S/W           │ [InChI=1S, W]             │                       2 │      0 │   183.84 │         183.95093 │
│ CHEBI:33385 │ thorium          │ Th            │ [Th]          │ InChI=1S/Th          │ [InChI=1S, Th]            │                       2 │      0 │ 232.0381 │         232.03806 │
│ CHEBI:33351 │ seaborgium atom  │ Sg            │ [Sg]          │                      │                           │                         │      0 │    263.0 │             271.0 │
│ CHEBI:36937 │ nitrogen-17 atom │ N             │               │                      │                           │                         │      0 │   14.007 │          14.00307 │
│ CHEBI:33357 │ hassium atom     │ Hs            │ [Hs]          │                      │                           │                         │      0 │    265.0 │             277.0 │
│ CHEBI:30440 │ thallium         │ Tl            │ [Tl]          │ InChI=1S/Tl          │ [InChI=1S, Tl]            │                       2 │      0 │ 204.3833 │         204.97443 │
│ CHEBI:33355 │ bohrium atom     │ Bh            │ [Bh]          │                      │                           │                         │      0 │    264.0 │             270.0 │
│ CHEBI:36935 │ nitrogen-13 atom │ N             │               │                      │                           │                         │      0 │   14.007 │          14.00307 │
│ CHEBI:33364 │ platinum         │ Pt            │ [Pt]          │ InChI=1S/Pt          │ [InChI=1S, Pt]            │                       2 │      0 │  195.078 │         194.96478 │
│ CHEBI:33369 │ cerium           │ Ce            │ [Ce]          │ InChI=1S/Ce          │ [InChI=1S, Ce]            │                       2 │      0 │  140.116 │         139.90544 │
│ CHEBI:33394 │ fermium          │ Fm            │ [Fm]          │ InChI=1S/Fm          │ [InChI=1S, Fm]            │                       2 │      0 │    257.0 │             257.0 │
│ CHEBI:33349 │ dubnium atom     │ Db            │ [Db]          │                      │                           │                         │      0 │    262.0 │             270.0 │
├─────────────┴──────────────────┴───────────────┴───────────────┴──────────────────────┴───────────────────────────┴─────────────────────────┴────────┴──────────┴───────────────────┤
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  1. Atom classes that are instances of Elements should be named <element> atom to be consistent with the rest of CHEBI atom naming conventions
  2. Atom classes that are instances of Isotopes should have an inchi. Even though inchis may not be conventionally specified they are for other isotopes in CHEBI and in the absence of any other mechanism to reliably mark out isotopes I recommend adding these. For example, N-14 would be InChI=1S/N/i1+0