MassBank / MassBank-data

Official repository of open data MassBank records
74 stars 59 forks source link

Missing CONFIDENCE value #69

Open Treutler opened 5 years ago

Treutler commented 5 years ago

There are 10 records having the tag COMMENT: CONFIDENCE without any confidence value. I think this should not be valid, so please create a rule for the validator and correct the confidence values.

This applies to: AAFC/AC000433 AAFC/AC000427 AAFC/AC000428 AAFC/AC000432 AAFC/AC000429 AAFC/AC000425 AAFC/AC000431 AAFC/AC000430 AAFC/AC000434 AAFC/AC000426

meier-rene commented 5 years ago

This is what we have in CONFIDENCE:

COMMENT: CONFIDENCE
COMMENT: CONFIDENCE:
COMMENT: CONFIDENCE Aspergillus sp.
COMMENT: CONFIDENCE Claviceps purpurea sclerotia
COMMENT: CONFIDENCE commercial standard
COMMENT: CONFIDENCE: condfident structure
COMMENT: CONFIDENCE: confident structure
COMMENT: CONFIDENCE Culture of Fusarium graminearum from DAOM
COMMENT: CONFIDENCE Culture of Penicillium eurotium strain
COMMENT: CONFIDENCE extrolite of Fusarium graminearum, NX-chemotype
COMMENT: CONFIDENCE Fusarium verticilloides
COMMENT: CONFIDENCE Identification confirmed with Reference Standard (Level 1)
COMMENT: CONFIDENCE: Identification confirmed with Reference Standard (Level 1)
COMMENT: CONFIDENCE Identification confirmed with Reference Standard synthesized at Eawag  (Level 1)
COMMENT: CONFIDENCE isolated standard
COMMENT: CONFIDENCE Parent Substance (Level 1)
COMMENT: CONFIDENCE Parent Substance with Reference Standard (Level 1)
COMMENT: CONFIDENCE Penicillium amphipolaria
COMMENT: CONFIDENCE Penicillium bissettii
COMMENT: CONFIDENCE Penicillium corvianum
COMMENT: CONFIDENCE Penicillium diabolicalicense
COMMENT: CONFIDENCE  Penicillium improvisum, Penicillium verrucosum
COMMENT: CONFIDENCE Penicillium nucicola
COMMENT: CONFIDENCE Penicillium sp.
COMMENT: CONFIDENCE Penicillium verrucosum
COMMENT: CONFIDENCE Probable structure via diagnostic evidence, tentative identification (Level 2b)
COMMENT: CONFIDENCE reference standard
COMMENT: CONFIDENCE Reference Standard (Level 1)
COMMENT: CONFIDENCE standard compound
COMMENT: CONFIDENCE Standard Compound
COMMENT: CONFIDENCE: structure hypothesis
COMMENT: CONFIDENCE synthesized standard
COMMENT: CONFIDENCE Tentative identification: best match only (Level 3)
COMMENT: CONFIDENCE Tentative identification: isomers possible (Level 3)
COMMENT: CONFIDENCE: Tentative identification: isomers possible (Level 3)
COMMENT: CONFIDENCE Tentative identification: molecular formula only (Level 4)
COMMENT: CONFIDENCE Tentative identification: most likely structure (Level 3)
COMMENT: CONFIDENCE Tentative identification only (Level 3)
COMMENT: CONFIDENCE Tentative identification: substance class known (Level 3)
COMMENT: CONFIDENCE Transformation product, tentative ID (Level 2b)
COMMENT: CONFIDENCE Transformation product, tentative ID (Level 3)
COMMENT: CONFIDENCE Transformation product, tentative ID (Level 3 structure)
COMMENT: CONFIDENCE Transformation product, tentative ID (Level 3 TP Class)
COMMENT: CONFIDENCE Transformation product with Reference Standard (Level 1)

Should we have a controlled vocabulary?

schymane commented 5 years ago

I've just pinged the contributor of those records by email, I can't find his GitHub handle. They originate from Justin in Canada.

Re controlled vocabulary, yes, some of those are certainly NOT confidence statements but rather statements of origins. We have a standard set of options in RMassBank, which could be a start for a controlled vocabulary. Eventually we should discuss a proper ontology with @sneumann to make various confidence statements compatible with the most commonly-used confidence level schemes (scheme, year, level ...). For now I'd say anything related to an organism should go into a different COMMENT field?

schymane commented 5 years ago

... and I could see we could unify some of those where just spacing and capitals are different ... so that we have fewer varieties of the same comment?