MSGFPlus / msgfplus

MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database.
Other
76 stars 36 forks source link

Add TDF nativeID format #41

Closed chambm closed 6 years ago

chambm commented 6 years ago

It would be nice if it would throw this error BEFORE an hour of searching. :)

Search progress: 48 / 48 tasks, 100.00%         41.23 minutes elapsed
Computing q-values...
Computing q-values finished (elapsed time: 0.54 sec)
Writing results...
Unsupported mzML format: data.mzML does not contain a child term of MS:1000767 (native spectrum identifier format)
FarmGeek4Life commented 6 years ago

I wish there was a java package that made it easier to work with the CV... ProteoWizard has the caveat of "an enum name might be broken by a future update to the CV", but at least it tracks relationships.

I am also going to add the following just for completeness:

MS:1001480 SCIEX TOF/TOF nativeID format
MS:1002303 Bruker Container nativeID format
MS:1002532 UIMF nativeID format
MS:1002898 Shimadzu Biotech QTOF nativeID format
chambm commented 6 years ago

Enum name breakage is easy to catch and fix (compile time error). It's supporting CV terms that are newer than pwiz's embedded CV that's the real problem. :) But that could only be fixed by downloading the new CV on the fly...or being more diligent about updating pwiz whenever the CV is updated.

FarmGeek4Life commented 6 years ago

Yeah, but unless I am wrong, the majority of the process of updating the ProteoWizard embedded CV is accomplished by another small program that generates the needed code files; you don't need to dig through the CV to find all of the child terms of another one because the ProteoWizard embedded CV stores them and has utility methods to do that. Also, producing readable CV params (i.e., not just a CVRef, Accession, and value) with ProteoWizard is easy because it stores all of that information internally. For MS-GF+ to output CV params with names, I have to manually set the name (same for the units).

chambm commented 6 years ago

Yep. For a long time I've wanted to have some automated hook that would catch PSI CV updates, run the pwiz CV update procedure, and create a PR. Just haven't actually done it yet! (and yes, having proper OBO support in pwiz and enum-based CV terms is one of Darren Kessner's genius contributions to pwiz).