Open edeutsch opened 8 years ago
Agree all FDR and q-value terms should not have any units - @germa can you check if there are other similar terms e.g. evalues and pvalues, PEP etc that have units. I don't think any of these should. thanks
I have update the example file. @edeutsch can you please re-run the validator. Thanks. Fawaz
Removed the units from the FDR and q-value terms in version 3.90.0 of psi-ms.obo
These issues remain today: WARNING: MS:1001062 should be 'Mascot MGF format' instead of 'Mascot MGF file' WARNING: MS:1001400 should be 'OMSSA xml format' instead of 'OMSSA xml file' WARNING: MS:1002439 should be 'final PSM list' instead of 'final PSM list UNDER DISCUSSION'
I don't see these issues in the file.
Message 1: Level: ERROR --> Non-fatal XML Parsing error detected on line 102973 Error message: cvc-pattern-valid: Wert '*' ist nicht Facet-gültig in Bezug auf Muster '[ABCDEFGHIJKLMNOPQRSTUVWXYZ?-]{1}' für Typ '#AnonType_postPeptideEvidenceType'.
Message 2: Level: ERROR --> Non-fatal XML Parsing error detected on line 102973 Error message: cvc-attribute.3: Wert '*' des Attributs 'post' bei Element 'PeptideEvidence' hat keinen gültigen Typ 'null'.
It means that according to the schema file4 mzIdentML1.2.0-candidate.xsd there is now '*' allowed in the post attribute of peptideEvidence
Regarding fghali's "I don't see these issues in the file", perhaps there is confusion about which file we are talking about. There are two similarly named "combined" files. Here are the issues I see:
peptide_level_stats_examples/combined_fdr_1.2.mzid.gz WARNING: MS:1001062 should be 'Mascot MGF format' instead of 'Mascot MGF file' WARNING: MS:1001400 should be 'OMSSA xml format' instead of 'OMSSA xml file' WARNING: MS:1002439 should be 'final PSM list' instead of 'final PSM list UNDER DISCUSSION'
multi_search/combined_1.2.mzid.gz ERROR: cvParam unknown modification should have a value, but it does not!
@fghali Hi Fawaz, please can you check these out again please thanks Andy
I have update the example file (peptide_level_stats_examples/combined_fdr_1.2.mzid.gz). @edeutsch can you please re-run the validator. Thanks. Fawaz
@fghali There are also errors in the file multi_search/combined_1.2.mzid.gz, see Gerhard's and Eric's messages above.
The main parsing error relates to these types of error:
AND
The star should be replaced with “-” assuming this is caused by the peptide being the N or C-terminus of the protein (instead of stars in the sequence, which shouldn’t happen). For now, we can you just do a Find and Replace, but it would be useful if you can track back to see which of the file format parsers is getting this wrong, and we can fix it.
I have update both files replacing "*" with "-". I'll check the parsers to see where it's happening.
This is is fine to my validators
Message 1: Rule ID: ProteinDetectionList_must_rule Level: ERROR Context(/cvParam/@accession ) in 2 locations --> None of the given CvTerms were found at '/MzIdentML/DataCollection/AnalysisData/ProteinDetectionList/cvParam/@accession' because no values were found:
Message 2: Level: WARN --> unanticipated terms for XPath '/MzIdentML/DataCollection/AnalysisData/SpectrumIdentificationList/cvParam/@accession' : [MS:1002439]
We get rid of using "final PSM list", see GitHub issue #5
Fixed.
This file seems valid to my validators. Not sure about the Java validator issue above.
My CV term validator finds these issues with this file: ERROR: cvParam distinct peptide-level q-value should have units, but it does not! WARNING: MS:1001062 should be 'Mascot MGF format' instead of 'Mascot MGF file' WARNING: MS:1001400 should be 'OMSSA xml format' instead of 'OMSSA xml file' WARNING: MS:1002439 should be 'final PSM list' instead of 'final PSM list UNDER DISCUSSION'
the first error may be an error in the CV. I don't think we want units for q-value in the term? Should we remove units from all q-value terms? This issue affects several.
[Term] id: MS:1001868 name: distinct peptide-level q-value def: "Estimation of the q-value for distinct peptides once redundant identifications of the same peptide have been removed (id e st multiple PSMs, possibly with different mass modifications, mapping to the same sequence have been collapsed to one entry)." [ PSI:PI] xref: value-type:xsd:double "The allowed value-type for this CV term." is_a: MS:1002484 ! peptide-level statistical threshold relationship: has_units UO:0000166 ! parts per notation unit relationship: has_units UO:0000187 ! percent relationship: has_domain MS:1002305 ! value between 0 and 1 inclusive