Closed kevinkovalchik closed 3 years ago
@kevinkovalchik Can you share the mzid file? @chambm any thoughts? Making MS-GF+ output the name attribute on "Enzyme" to fix this issue feels like an unnecessary hack.
Looking at the mzIdentML 1.1.1 spec, name is an optional attribute for the <Enzyme>
element. In contrast, name
is required for the <cvParam>
element. I highly suspect that idconvert is treating name
as a required attribute for <Enzyme>
. We will update MS-GF+ to create .mzid files with a name
attribute in the <Enzyme>
element; it's a minor change that matches the spec.
Here is the example enzyme entry from https://github.com/HUPO-PSI/mzIdentML/blob/master/specification_document/specdoc1_1/mzIdentML1.1.1.doc
<Enzymes>
<Enzyme id="ENZ_0" cTermGain="OH" nTermGain="H" semiSpecific="0">
<SiteRegexp><![CDATA[(?<=[KR])(?!P)]]></SiteRegexp>
<EnzymeName>
<cvParam accession="MS:1001251" name="Trypsin" cvRef="PSI-MS"/>
</EnzymeName>
</Enzyme>
...
</Enzymes>
Notice that name
is not an attribute for <Enzyme>
. But, like I said, it's optional:
Attribute Name: id
Data Type: xsd:string
Use: required
Definition: An identifier is an unambiguous string that is unique within the scope (i.e. a document, a set of related documents, or a repository) of its use.
Attribute Name: name
Data Type: xsd:string
Use: optional
Definition: The potentially ambiguous common identifier, such as a human-readable name for the instance.
Attribute Name: semiSpecific
Data Type: xsd:boolean
Use: optional
Definition: Set to true if the enzyme cleaves semi-specifically (i.e. one terminus MUST cleave according to the rules, the other can cleave at any residue), false if the enzyme cleavage is assumed to be specific to both termini (accepting for any missed cleavages).
Looking into this, MS-GF+ uses jmzIdentML to create the .mzid file: https://github.com/PRIDE-Utilities/jmzIdentML
That means that updating things to include a name
attribute for the <Enzyme>
element will be harder than I thought. It's entirely possible that we'd have to clone their repo to make the change, which is less than ideal. The alternative is to create the .mzid file then post-process it to insert the name
attribute. Better yet would be for @chambm to update idconvert to not require <Enzyme>
to have a name
attribute
Thank you for your work @chambm It's always fun finding 9-year-old bugs, right?
:champagne: It's a testament to how few people use idconvert, or use unspecific searches, or some mix of those. ;)
Indeed, it is not the most common of combinations... Thanks for working on it!
Describe the question or problem I'm unsure if this is a bug or if I just need help. I am unable to convert the mzid files generated by MS-GF+ into pepXML files using idconvert.
Details Running a search with unspecific digest, I get the mzid output fine. But when trying to convert to pepXML using the idconvert which is distributed with the TPP I am getting the following error:
When I try the same conversion on a file from a tryptic search it works fine.
Useful extras MS-GF+ version: Release (v2021.01.08) (8 January 2021) OS: Ubuntu 20.04 TPP version: 5.2 Information on idconvert release:
This same issue was mentioned on the ProteoWizard mailing list a while ago. While there was not much in the way of follow up, they seemed to think it was not an idconvert issue, but I'm not sure: https://sourceforge.net/p/proteowizard/mailman/message/32673464/
I looked into the mzid file and found this for cleavage info:
It seems to look okay to me, though I am not an expert on mzid specifications. However, if I make this change (adding a name tag to the Enzyme element) then the conversion runs fine:
This looks weird to me because the name tag is duplicated. What do you think? Is it an issue with MS-GF+ or with idconvert? Or with something else I am doing wrong?
Thanks!
Kevin