HUPO-PSI / mzIdentML

Repository for mzIdentML and the corresponding examples
23 stars 24 forks source link

combined_fdr_1.2.mzid validation issues #35

Open edeutsch opened 8 years ago

edeutsch commented 8 years ago

My CV term validator finds these issues with this file: ERROR: cvParam distinct peptide-level q-value should have units, but it does not! WARNING: MS:1001062 should be 'Mascot MGF format' instead of 'Mascot MGF file' WARNING: MS:1001400 should be 'OMSSA xml format' instead of 'OMSSA xml file' WARNING: MS:1002439 should be 'final PSM list' instead of 'final PSM list UNDER DISCUSSION'

the first error may be an error in the CV. I don't think we want units for q-value in the term? Should we remove units from all q-value terms? This issue affects several.

[Term] id: MS:1001868 name: distinct peptide-level q-value def: "Estimation of the q-value for distinct peptides once redundant identifications of the same peptide have been removed (id e st multiple PSMs, possibly with different mass modifications, mapping to the same sequence have been collapsed to one entry)." [ PSI:PI] xref: value-type:xsd:double "The allowed value-type for this CV term." is_a: MS:1002484 ! peptide-level statistical threshold relationship: has_units UO:0000166 ! parts per notation unit relationship: has_units UO:0000187 ! percent relationship: has_domain MS:1002305 ! value between 0 and 1 inclusive

andrewrobertjones commented 8 years ago

Agree all FDR and q-value terms should not have any units - @germa can you check if there are other similar terms e.g. evalues and pvalues, PEP etc that have units. I don't think any of these should. thanks

fawazghali commented 8 years ago

I have update the example file. @edeutsch can you please re-run the validator. Thanks. Fawaz

germa commented 8 years ago

Removed the units from the FDR and q-value terms in version 3.90.0 of psi-ms.obo

edeutsch commented 8 years ago

These issues remain today: WARNING: MS:1001062 should be 'Mascot MGF format' instead of 'Mascot MGF file' WARNING: MS:1001400 should be 'OMSSA xml format' instead of 'OMSSA xml file' WARNING: MS:1002439 should be 'final PSM list' instead of 'final PSM list UNDER DISCUSSION'

fawazghali commented 8 years ago

I don't see these issues in the file.

germa commented 8 years ago

Message 1: Level: ERROR --> Non-fatal XML Parsing error detected on line 102973 Error message: cvc-pattern-valid: Wert '*' ist nicht Facet-gültig in Bezug auf Muster '[ABCDEFGHIJKLMNOPQRSTUVWXYZ?-]{1}' für Typ '#AnonType_postPeptideEvidenceType'.

Message 2: Level: ERROR --> Non-fatal XML Parsing error detected on line 102973 Error message: cvc-attribute.3: Wert '*' des Attributs 'post' bei Element 'PeptideEvidence' hat keinen gültigen Typ 'null'.

It means that according to the schema file4 mzIdentML1.2.0-candidate.xsd there is now '*' allowed in the post attribute of peptideEvidence mzid_peptideevidence_post_star_not_allowed

edeutsch commented 8 years ago

Regarding fghali's "I don't see these issues in the file", perhaps there is confusion about which file we are talking about. There are two similarly named "combined" files. Here are the issues I see:

peptide_level_stats_examples/combined_fdr_1.2.mzid.gz WARNING: MS:1001062 should be 'Mascot MGF format' instead of 'Mascot MGF file' WARNING: MS:1001400 should be 'OMSSA xml format' instead of 'OMSSA xml file' WARNING: MS:1002439 should be 'final PSM list' instead of 'final PSM list UNDER DISCUSSION'

multi_search/combined_1.2.mzid.gz ERROR: cvParam unknown modification should have a value, but it does not!

andrewrobertjones commented 8 years ago

@fghali Hi Fawaz, please can you check these out again please thanks Andy

fawazghali commented 8 years ago

I have update the example file (peptide_level_stats_examples/combined_fdr_1.2.mzid.gz). @edeutsch can you please re-run the validator. Thanks. Fawaz

andrewrobertjones commented 8 years ago

@fghali There are also errors in the file multi_search/combined_1.2.mzid.gz, see Gerhard's and Eric's messages above.

The main parsing error relates to these types of error:

AND

The star should be replaced with “-” assuming this is caused by the peptide being the N or C-terminus of the protein (instead of stars in the sequence, which shouldn’t happen). For now, we can you just do a Find and Replace, but it would be useful if you can track back to see which of the file format parsers is getting this wrong, and we can fix it.

fawazghali commented 8 years ago

I have update both files replacing "*" with "-". I'll check the parsers to see where it's happening.

edeutsch commented 8 years ago

This is is fine to my validators

germa commented 8 years ago

Message 1: Rule ID: ProteinDetectionList_must_rule Level: ERROR Context(/cvParam/@accession ) in 2 locations --> None of the given CvTerms were found at '/MzIdentML/DataCollection/AnalysisData/ProteinDetectionList/cvParam/@accession' because no values were found:

Message 2: Level: WARN --> unanticipated terms for XPath '/MzIdentML/DataCollection/AnalysisData/SpectrumIdentificationList/cvParam/@accession' : [MS:1002439]

We get rid of using "final PSM list", see GitHub issue #5

fawazghali commented 8 years ago

Fixed.

edeutsch commented 8 years ago

This file seems valid to my validators. Not sure about the Java validator issue above.