Open meier-rene opened 5 years ago
I stumbled over this spectrum and noticed the fragment m/z's compared to the parent-ion-mz (423.5989 and 782.2591 vs 147.0441). If the validator could reveal fragments which are too heavy, than we would at least be aware of that.
That spectrum looks like it is only noise (those are the only two peaks). Note that spectra processed with RMassBank can sometimes contain heavier peaks if they have certain adducts (up to +N2O allowed), so this should be considered in any validation. There are, however, many spectra with bogus heavy peaks that are clearly just noise (where it is clear from mass defect etc, like in this case) and maybe the (sub)formula assignment routine in RMB could be integrated into the validator to help separate the possible goodies from the baddies?
What is going to be the procedure for spectra that the validator identifies as (likely) pure noise, like the example you just raised?
I have no idea for the proper procedure, especially in this case, because its experimental data, not meta data. I wouldn't touch it. One could flag it or raise an issue with the original contributor, but sometimes this will be complicated.
Check CH$NAME for
In light of issues found/raised by Herbert Oberacher recently, I see a couple of new ideas we should consider implementing in the validator:
Please check whether all fragment-m/z in the PK$ANNOTATION
section are present in the PK$PEAK
section
Good idea, I suggest to build in a slight tolerance to avoid decimal place issues. I wouldn't check on the reverse, i.e. there may be fewer PK$ANNOTATION entries than PK$PEAK but there should not be more PK$ANNOTATION entries than PK$PEAK (unless anyone puts out multiple annotations for a given PK$PEAK, I am not aware of this case ... RMassBank only puts out one formula per peak and tags if more were possible ...
unless anyone puts out multiple annotations for a given PK$PEAK, I am not aware of this case ...
- I am not 100% certain that we don't have cases like this from RMassBank - the s4power branch is certainly able to produce such records if you tell it to.
- I personally don't think a record should be invalid if multiple annotations are present for a peak. Note that the annotation field is loosely defined in what it is allowed to contain, so this is certainly legal and possibly also welcome in some cases...
Agree with @meowcat - in principle no problem with having multiple annotations for one peak
We should add a validator check that screens whether there are spectra with identical SPLASHes but conflicting compound information. I have just reported several cases of this in MassBank-data - it would be great to screen whether any more cases exist so we can amend as required, and add this as a general check to avoid this happening in the future. It is very hard for us to catch this on the RMassBank side if people do not do the manual checks (but this is something we must consider how to validate on the data processing side too).
Please check the identity of the three structure identifiers InChIKey, SMILES, and InChI
...and once these (InChI, SMILES and InChIKey) are consistent within another, we need to check that the related database identifiers match by InChIKey...and either update or remove incorrect ones.
I want collect Ideas for automatic Validation in this Issue: -
check for duplicate entries in CH$NAMEimplemented and applied, 242 records fixed -check for InChIKey-style pattern in CH$NAMEimplemented and applied, 131 records fixed -perhaps flag for super-short names that contain letters and numbers (as these are e.g. database codes, like CID1233 or something, a lot of ZINC and CHEBI sneak through