MassBank / RMassBank

Playground for experiments on the official http://bioconductor.org/packages/devel/bioc/html/RMassBank.html
Other
12 stars 15 forks source link

Duplicated PK$PEAKS in output record in case of ambiguous PK$ANNOTATION #273

Open sneumann opened 3 years ago

sneumann commented 3 years ago

Hi, this is mainly a reminder that we (e.g. @achimmiri) are facing duplicated peaks in a record. The PK$ANNOTATION might have multiple formulae, and then the PK$PEAK are also duplicated. I think the PK$PEAK should be unique. This is with a fairly old snapshot of RMB, so we'll report back if that is still the case in the recently merged s4power branch. Yours, Steffen

PK$ANNOTATION: m/z tentative_formula formula_count mass error(ppm)
...
  150.0986 C6H14O4- 2 150.0898 59.05
  150.0986 C10H14O- 2 150.105 -42.59
...
PK$PEAK: m/z int. rel.int.
...
  150.0986 55 69
  150.0986 55 69

Yours, Steffen

ACCESSION: XY000267
RECORD_TITLE: S3:20(4,4,12); LC-ESI-QTOF; MS2; CE: -10,-35,-60V; [M+HCOOH-H]-
DATE: 2020.11.22
AUTHORS: Micha Gracianna Devi and Gerd Balcke, Leibniz Institute of Plant Biochemistry (IPB) Halle, Germany
LICENSE: CC BY
COPYRIGHT: Copyright (C) Micha Gracianna Devi and Gerd Balcke, Leibniz Institute of Plant Biochemistry (IPB) Halle, Germany
PUBLICATION:  
COMMENT: CONFIDENCE Predicted
CH$NAME: S3:20(4,4,12)
CH$NAME: Acyl sucrose
CH$COMPOUND_CLASS: acyl sugar
CH$FORMULA: C32H56O14
CH$EXACT_MASS: 664.367
CH$SMILES: O[C@H]1[C@@H](O[C@@]2(O[C@H](CO)[C@@H](O)[C@@H]2O)COC(CCCCCCCCCCC)=O)O[C@H](CO)[C@@H](OC(C(C)C)=O)[C@@H]1OC(C(C)C)=O
CH$IUPAC: InChI=1S/C32H56O14/c1-6-7-8-9-10-11-12-13-14-15-23(35)41-18-32(28(38)24(36)21(16-33)45-32)46-31-25(37)27(44-30(40)20(4)5)26(22(17-34)42-31)43-29(39)19(2)3/h19-22,24-28,31,33-34,36-38H,6-18H2,1-5H3/t21-,22-,24-,25-,26-,27-,28+,31-,32+/m1/s1
CH$LINK: INCHIKEY VJBZSJVPJDYEQJ-YDBGBXJUSA-N
AC$INSTRUMENT: TripleToF5600, Sciex; Acquity, Waters
AC$INSTRUMENT_TYPE: LC-ESI-QTOF
AC$MASS_SPECTROMETRY: MS_TYPE MS2
AC$MASS_SPECTROMETRY: ION_MODE NEGATIVE
AC$MASS_SPECTROMETRY: IONIZATION ESI
AC$MASS_SPECTROMETRY: COLLISION_ENERGY -10,-35,-60V
AC$MASS_SPECTROMETRY: FRAGMENTATION_METHOD CID
AC$MASS_SPECTROMETRY: SPRAY_VOLTAGE -4.5 kV
AC$MASS_SPECTROMETRY: ENTRNCE_POTENTIAL -10 V
AC$MASS_SPECTROMETRY: SOURCE_TEMPERATURE 600 °C
AC$MASS_SPECTROMETRY: CURTAIN GAS 35 psi
AC$MASS_SPECTROMETRY: GAS_01 60 psi
AC$MASS_SPECTROMETRY: GAS_02 70 psi
AC$MASS_SPECTROMETRY: DECLUSTERING_POTENTIAL -35 V
AC$MASS_SPECTROMETRY: FRAGMENTATION_MODE CID
AC$CHROMATOGRAPHY: COLUMN_NAME Nucleoshell RP18, Macherey & Nagel
AC$CHROMATOGRAPHY: RETENTION_TIME 15.531 min
AC$CHROMATOGRAPHY: FLOW_GRADIENT 0-2 min:5%, 19 min:95%, 21 min:95%, 21.01 min:5% 24 min:5% (acetonitrile)
AC$CHROMATOGRAPHY: FLOW_RATE 0.4 ml/min
AC$CHROMATOGRAPHY: SOLVENT CH3CN/ H2O(0.3 mM NH4COO + formic acid pH3)
AC$CHROMATOGRAPHY: COLUMN_TEMPERATURE 40 °C
MS$FOCUSED_ION: BASE_PEAK 709.3652
MS$FOCUSED_ION: PRECURSOR_M/Z 709.3652
MS$FOCUSED_ION: PRECURSOR_TYPE [M+HCOOH-H]-
MS$DATA_PROCESSING: REANALYZE Peaks with additional N2/O included
MS$DATA_PROCESSING: WHOLE RMassBank 2.11.4
PK$SPLASH: splash10-004i-0091000100-1b072dd57bb53138ccca
PK$ANNOTATION: m/z tentative_formula formula_count mass error(ppm)
  73.0273 C3H5O2- 1 73.0295 -30.44
  87.042 C4H7O2- 1 87.0452 -36.57
  97.0319 C5H5O2- 1 97.0295 25.12
  110.0232 C2H6O5- 1 110.0221 9.8
  115.0768 C6H11O2- 1 115.0765 3.1
  150.0986 C6H14O4- 2 150.0898 59.05
  150.0986 C10H14O- 2 150.105 -42.59
  171.139 C10H19O2- 2 171.1391 -0.08
  171.139 C3H23O7- 2 171.1449 -34.4
  197.1219 C15H17- 3 197.1336 -59.27
  197.1219 C4H21O8- 3 197.1242 -11.67
  197.1219 C11H17O3- 3 197.1183 18.12
  209.1286 C16H17- 4 209.1336 -23.93
  209.1286 C9H21O5- 4 209.1394 -52.01
  209.1286 C12H17O3- 4 209.1183 49.02
  209.1286 C5H21O8- 4 209.1242 20.94
  223.2224 C12H31O3- 4 223.2279 -24.36
  223.2224 C5H35O8- 4 223.2337 -50.67
  223.2224 C8H31O6- 4 223.2126 43.98
  223.2224 CH35O11- 4 223.2185 17.67
  225.224 C8H33O6- 4 225.2283 -18.79
  225.224 C15H29O- 4 225.2224 7.29
  225.224 CH37O11- 4 225.2341 -44.87
  225.224 C4H33O9- 4 225.213 48.95
  230.1461 C12H22O4- 4 230.1524 -27.23
  230.1461 CH26O12- 4 230.143 13.54
  230.1461 C8H22O7- 4 230.1371 39.06
  230.1461 C5H26O9- 4 230.1582 -52.75
  233.1559 C4H25O10- 4 233.1453 45.42
  233.1559 C15H21O2- 4 233.1547 5.17
  233.1559 C8H25O7- 4 233.1606 -20.02
  233.1559 CH29O12- 4 233.1664 -45.2
  253.2176 C5H33O10- 4 253.2079 38.42
  253.2176 C16H29O2- 4 253.2173 1.37
  253.2176 C9H33O7- 4 253.2232 -21.83
  253.2176 C2H37O12- 4 253.2291 -45.02
  255.2274 C12H31O5- 4 255.2177 38.09
  255.2274 C5H35O10- 4 255.2236 15.08
  255.2274 C16H31O2- 4 255.233 -21.68
  255.2274 C9H35O7- 4 255.2388 -44.69
  269.2124 C5H33O11- 6 269.2028 35.53
  269.2124 C20H29- 6 269.2275 -55.99
  269.2124 C9H33O8- 6 269.2181 -21.14
  269.2124 C2H37O13- 6 269.224 -42.96
  269.2124 C12H29O6- 6 269.197 57.35
  269.2124 C16H29O3- 6 269.2122 0.67
  277.2124 H37O15- 6 277.2138 -4.92
  277.2124 C11H33O7- 6 277.2232 -38.77
  277.2124 C7H33O10- 6 277.2079 16.27
  277.2124 C18H29O2- 6 277.2173 -17.58
  277.2124 C14H29O5- 6 277.202 37.45
  277.2124 C4H37O12- 6 277.2291 -59.95
  278.2216 C18H30O2- 6 278.2251 -12.83
  278.2216 H38O15- 6 278.2216 -0.21
  278.2216 C4H38O12- 6 278.2369 -55.04
  278.2216 C7H34O10- 6 278.2157 20.9
  278.2216 C11H34O7- 6 278.231 -33.94
  278.2216 C14H30O5- 6 278.2099 42.01
  279.2349 C4H39O12- 5 279.2447 -35.13
  279.2349 C18H31O2- 5 279.233 6.93
  279.2349 C11H35O7- 5 279.2388 -14.1
  279.2349 H39O15- 5 279.2294 19.5
  279.2349 C7H35O10- 5 279.2236 40.54
  281.2434 C7H37O10- 5 281.2392 14.79
  281.2434 H41O15- 5 281.2451 -6.1
  281.2434 C18H33O2- 5 281.2486 -18.57
  281.2434 C14H33O5- 5 281.2333 35.67
  281.2434 C11H37O7- 5 281.2545 -39.46
  311.3073 C13H43O7- 6 311.3014 18.8
  311.3073 C2H47O15- 6 311.292 48.94
  311.3073 C10H47O9- 6 311.3226 -49.07
  311.3073 C6H47O12- 6 311.3073 -0.07
  311.3073 C17H43O4- 6 311.3167 -30.21
  311.3073 C20H39O2- 6 311.2956 37.67
  394.1472 C23H22O6- 9 394.1422 12.77
  394.1472 C30H18O- 9 394.1363 27.67
  394.1472 C12H26O14- 9 394.1328 36.58
  394.1472 C20H26O8- 9 394.1633 -40.84
  394.1472 C27H22O3- 9 394.1574 -25.94
  394.1472 C13H30O13- 9 394.1692 -55.74
  394.1472 C19H22O9- 9 394.1269 51.48
  394.1472 C16H26O11- 9 394.1481 -2.13
  394.1472 C9H30O16- 9 394.1539 -17.03
  481.1903 C16H33O16- 8 481.1774 26.81
  481.1903 C20H33O13- 8 481.1927 -4.89
  481.1903 C27H29O8- 8 481.1868 7.31
  481.1903 C31H29O5- 8 481.202 -24.39
  481.1903 C30H25O6- 8 481.1657 51.23
  481.1903 C24H33O10- 8 481.2079 -36.6
  481.1903 C23H29O11- 8 481.1715 39.02
  481.1903 C17H37O15- 8 481.2138 -48.8
  539.226 C30H35O9- 7 539.2287 -5.02
  539.226 C27H39O11- 7 539.2498 -44.2
  539.226 C23H39O14- 7 539.2345 -15.91
  539.226 C26H35O12- 7 539.2134 23.27
  539.226 C20H43O16- 7 539.2557 -55.09
  539.226 C33H31O7- 7 539.2075 34.17
  539.226 C22H35O15- 7 539.1981 51.57
  583.3836 C24H55O15- 4 583.3546 49.65
  583.3836 C32H55O9- 4 583.3852 -2.65
  583.3836 C31H51O10- 4 583.3488 59.72
  583.3836 C28H55O12- 4 583.3699 23.5
  652.3571 C31H56O14- 2 652.3676 -16
  652.3571 C30H52O15- 2 652.3312 39.78
  663.3557 C32H55O14- 2 663.3597 -6.06
  663.3557 C31H51O15- 2 663.3233 48.79
  665.4076 C32H57O14- 1 665.3754 48.5
  691.3843 C33H55O15- 1 691.3546 42.87
  693.3937 C33H57O15- 1 693.3703 33.73
  702.3353 C33H50O16- 1 702.3104 35.45
  705.3461 C33H53O16- 1 705.3339 17.33
  709.3856 C33H57O16- 1 709.3652 28.69
PK$NUM_PEAK: 112
PK$PEAK: m/z int. rel.int.
  73.0273 55 69
  87.042 111 140
  97.0319 55 69
  110.0232 55 69
  115.0768 55 69
  150.0986 55 69
  150.0986 55 69
  171.139 277 351
  171.139 277 351
  197.1219 55 69
  197.1219 55 69
  197.1219 55 69
  209.1286 55 69
  209.1286 55 69
  209.1286 55 69
  209.1286 55 69
  223.2224 55 69
  223.2224 55 69
  223.2224 55 69
  223.2224 55 69
  225.224 166 210
  225.224 166 210
  225.224 166 210
  225.224 166 210
  230.1461 55 69
  230.1461 55 69
  230.1461 55 69
  230.1461 55 69
  233.1559 222 281
  233.1559 222 281
  233.1559 222 281
  233.1559 222 281
  253.2176 166 210
  253.2176 166 210
  253.2176 166 210
  253.2176 166 210
  255.2274 222 281
  255.2274 222 281
  255.2274 222 281
  255.2274 222 281
  269.2124 111 140
  269.2124 111 140
  269.2124 111 140
  269.2124 111 140
  269.2124 111 140
  269.2124 111 140
  277.2124 277 351
  277.2124 277 351
  277.2124 277 351
  277.2124 277 351
  277.2124 277 351
  277.2124 277 351
  278.2216 166 210
  278.2216 166 210
  278.2216 166 210
  278.2216 166 210
  278.2216 166 210
  278.2216 166 210
  279.2349 284 360
  279.2349 284 360
  279.2349 284 360
  279.2349 284 360
  279.2349 284 360
  281.2434 116 147
  281.2434 116 147
  281.2434 116 147
  281.2434 116 147
  281.2434 116 147
  311.3073 55 69
  311.3073 55 69
  311.3073 55 69
  311.3073 55 69
  311.3073 55 69
  311.3073 55 69
  394.1472 111 140
  394.1472 111 140
  394.1472 111 140
  394.1472 111 140
  394.1472 111 140
  394.1472 111 140
  394.1472 111 140
  394.1472 111 140
  394.1472 111 140
  481.1903 111 140
  481.1903 111 140
  481.1903 111 140
  481.1903 111 140
  481.1903 111 140
  481.1903 111 140
  481.1903 111 140
  481.1903 111 140
  539.226 55 69
  539.226 55 69
  539.226 55 69
  539.226 55 69
  539.226 55 69
  539.226 55 69
  539.226 55 69
  583.3836 55 69
  583.3836 55 69
  583.3836 55 69
  583.3836 55 69
  652.3571 55 69
  652.3571 55 69
  663.3557 111 140
  663.3557 111 140
  665.4076 111 140
  691.3843 111 140
  693.3937 111 140
  702.3353 111 140
  705.3461 114 144
  709.3856 787 999
//
schymane commented 3 years ago

@sneumann @achimmiri those ppm values are huge ... can you try with a lower ppm setting (5 or 10?). Otherwise if this is TOF data with such large error margins maybe recal is not the best idea (you could set recalibrate.identity). RMassBank can deal with multiple formulas (see e.g. any fluorinated EA or EQ spectrum!) but indeed we did encounter this strange duplication in the past as a quite rare occurrence that we could not explain. To debug/investigate @meowcat would need the msWorkflow file (from the new branch). Thanks!

meowcat commented 3 years ago

Precisely - this is a worrying bug and it would be great if I could debug this with the original data; I have zero chance of reproducing this on my own.