Closed andrewrobertjones closed 5 years ago
Hmmmmmm, I'm apparently having some issues groking that column.
How about this: using the charged form of the Progenesis-style identifier in those columns (for Progenesis, and the purposes of this example). So SME18 would have evidence_input_id 21.37_147.1122m/z and SME19 would have evidence_input_id 21.37_169.0943m/z.
This has the properties that:
MetaScope scores fragmentation matches across the aggregate of fragmentation spectra (across both runs and adduct forms) so it will be naturally prone towards a compressed evidence table. That's relatively atypical so it would be worth having more example files, but in this case it's true to how the data was processed.
Hi Joel,
Yes this sounds good. Can you also put in CV terms into the files instead of user params, now they've been added:
[MS,MS:1002879,Progenesis QI,2.4.6505.48857]
[MS,MS:1002889,Progenesis MetaScope Score,] [MS,MS:1002889,Progenesis MetaScope Score,] [MS,MS:1002889,Progenesis MetaScope Score,] [MS,MS:1002889,Progenesis MetaScope Score,] [MS,MS:1002889,Progenesis MetaScope Score,] [MS,MS:1002889,Progenesis MetaScope Score,] [MS,MS:1002889,Progenesis MetaScope Score,]
etc
Thanks!
I have made a pull request to fix this (including the CV terms bit): https://github.com/HUPO-PSI/mzTab/pull/167
Closing for now, changes were merged in #167
Hi all, I am using some of the examples in MTBLS263 file for the paper but I think some of the logic for SME rows is not as intended by the specs.
`
In these rows, they share the same evidence_input_id, which is supposed to indicate that the same evidence gave rise to different results (for example). Here the same evidence_input_id is used to indicate that different inputs gave rise to IDs of the same compound (or at least different adduct forms), this needs to be fixed otherwise it will confuse readers.
A second possible issue relates to how much data to compress on each row. We discussed this one previously I recall, that ideally one row of SME should be a single search event, unless software aggregates multiple spectra (in this case) for a search. In this file, one row contains the combined results from searching lots of MS2 spectra, and presumably only the best score is reported. I think a preferred encoding would be to enumerate all spectra that were searched across multiple rows, so that it is explicitly clear which fragmentation spectrum gave rise to the score reported.
@jmrein any chance you could take a look?