MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
14 stars 22 forks source link

[RecordFormat] more clear statement on PK$ANNOTATION and cleanup of existing records #301

Open stanstrup opened 3 years ago

stanstrup commented 3 years ago

Hi,

I was looking into the proper way to annotate fragments and losses, e.g. "[M+H-NH3]+". The specifications say "Contributors freely define the record format by using appropriate terms. ", which leads me to expect a list of terms. But that seems not to be there. So are the allowed terms those from the examples?

I looked through the current DB on github and the most common column name seems to be "type". There are a few records with "ion". The only records that use "annotation" as the specs say put in an m/z value....




Isotopes

Then I was looking for isotope annotation and it seems the specs suggest to use the same field for isotopes and fragment/adduct annotation. The example

PK$ANNOTATION: m/z formula annotation exact_mass error(ppm) 
  167.08947 C9H12O2N [M+1]+(13C) 167.08961 0.81
  168.08681 C9H12O2N [M+1]+(13C, 15N) 168.08664 1.04

Some confusing things for me here

  1. For the +2 peak it seems from simulations that the contribution is about 50/50 from (13C, 18O) and (13C, 13C). Very little from (13C, 15N). Does it make sense to specify at all? Wouldn't it make more sense to simply have [M] and [M+1] for the isotope specification?. leading to next question -->
  2. M+1 is confusing here in my opinion. The peaks in the example refer to the [M+H]+ ions for [M] and [M+1] isotopes. Would a format of [M+H]([M]) and [M+H]+([M+1]) make sense? That is more similar to what CAMERA does.
  3. Would it make more sense to have a separate annotation field for isotopes?
meowcat commented 3 years ago

Note that the HUPO-PSI people have been discussing on peak annotation format for a while: https://github.com/HUPO-PSI/mzSpecLib/issues/23 https://docs.google.com/document/d/1yEUNG4Ump6vnbMDs4iV4s3XISflmOkRAyqUuutcCG2w

Their current proposition is a NIST-like fomat, and encoded in a regex:

^(?:(?<analyte_reference>[^/\s]+)@)?(?:(?:(?<series>[axbycz]\.?)(?<ordinal>\d+))|(?<series_internal>[m](?<internal_start>\d+):(?<internal_end>\d+))|(?<precursor>p)|(:?I(?<immonium>[ARNDCEQGHKMFPSTWYVIL])(?:\[(?<immonium_modification>(?:[^\]]+))\])?)|(?<reporter>r(?:(?:\[(?<reporter_label>[^\]]+)\])))|(?:f\{(?<formula>[A-Za-z0-9]+)\})|(?:_(?<external_ion>[^\s,/]+)))(?<neutral_losses>(?:[+-]\d*(?:(?:[A-Z][A-Za-z0-9]*)|(?:\[(?:(?:[A-Za-z0-9:\.]+))\])))+)?(?:(?<isotope>[+-]\d*)i)?(?:\^(?<charge>[+-]?\d+))?(?:\[M(?<adducts>(:?[+-]\d*[A-Z][A-Za-z0-9]*)+)\])?(?:/(?<mass_error>[+-]?\d+(?:\.\d+)?)(?<mass_error_unit>ppm)?)?(?:\*(?<confidence>\d*(?:\.\d+)?))? https://docs.google.com/document/d/1yEUNG4Ump6vnbMDs4iV4s3XISflmOkRAyqUuutcCG2w

A (currently still open) pull request for an annotation parser: https://github.com/HUPO-PSI/mzSpecLib/pull/28

I had proposed a less "encoded" and more easily machine-readable alternative (see https://github.com/HUPO-PSI/mzSpecLib/issues/23#issuecomment-654237467 ), this was somewhat favorably received, but seems to not have gone any further.