MassBank / RMassBank

Playground for experiments on the official http://bioconductor.org/packages/devel/bioc/html/RMassBank.html
Other
12 stars 15 forks source link

Improve marking formula annotation as "tentative" in MassBank records? #20

Closed sneumann closed 10 years ago

sneumann commented 10 years ago

Issue by uchem-massbank from Wednesday Jul 31, 2013 at 12:04 GMT Originally opened as https://github.com/sneumann/RMassBank/issues/12


Possibly changing PK$ANNOTATION: m/z num {formula mass error(ppm)} to PK$ANNOTATION: m/z num {tentative formula, mass, error(ppm)} ?

Why? question from external parties querying the validity of some annotations. To dig deeper: how valid is the "taking the lowest ppm error formula" auto-annotate. Do we not annotate if in doubt (e.g. F compounds?) or mark annotations if duplicate formulas are possible? See email from Takaaki-san 31/7/13.

sneumann commented 10 years ago

Comment by schymane from Wednesday Jul 31, 2013 at 12:40 GMT


Possibly look into an (optional?) DBE upper limit, e.g. parent DBE + 3. Ref: http://pubs.acs.org/doi/abs/10.1021/ci980171b Propachor example of m/z 152: parent = 5, C11H4O+ = 10.

sneumann commented 10 years ago

Comment by schymane from Wednesday Aug 07, 2013 at 11:55 GMT


Additional feedback: Unsuitable annotation to a product ion m/z 226.1248 in two records; EA255503.and EA255502; 226.1248 1 C10H18N4S+ 226.1247 0.67 No corresponding C10H18N2S+. Should we "tag" the adduct formulas in the annotation?

sneumann commented 10 years ago

Comment by schymane from Thursday Aug 08, 2013 at 06:26 GMT


Alternatively I propose that showing an adduct ion as [C6H6N + N2]+ looks like an adduct much more than C6H6N3+. An example of PK$ANNOTATION is 120.0555 1 [C6H6N + N2]+ 120.0556 -0.95 The word tentative not necessary. Source: Email from Takaaki-san, 8.8.13 Problem with adducts: the +O (=> adduct H2O). Display with +O preferable (and more reflective of what we really do).

sneumann commented 10 years ago

Comment by schymane from Friday Sep 13, 2013 at 09:21 GMT


Re: DBE count: already attempted much earlier in development, taken out again as it removed too many valid formulas. Will not reinstate this. Instead, will add "formula count" to annotations, this will make it clear to the reader that more than one formula was possible.

sneumann commented 10 years ago

Comment by schymane from Friday Sep 13, 2013 at 09:24 GMT


Re: tagging adduct formulas - this information is no longer available in the annotation stage. Users can opt in or out of auto-adduct-annotation (reanalyze peaks). Will add an additional MSDATA$PROCESSING message to indicate whether the records were generated with this option or not. This will appear just above the annotation. Re: tentative formula - we consider it better to say "tentative_formula" since they are assigned automatically, new records will have this.

sneumann commented 10 years ago

Comment by meowcat from Friday Sep 13, 2013 at 14:35 GMT


Changes have been implemented in meowcat: dataprocessing message concerning reanalyzed peaks, changes in column names. Also, a customizable (overridable) annotator e.g. for building scripts which annotate SMILES code with MetFrag info or similar.

sneumann commented 10 years ago

Comment by meowcat from Friday Sep 13, 2013 at 14:36 GMT


Also, did not (for the time being) implement the max DBE limit - I had it once and it eliminated too many peaks.