Open edeutsch opened 8 years ago
Validation errors found in today's version: WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file' WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein' WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.' WARNING: MS:1002544 should be 'xi' instead of 'xiFDR' WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.' WARNING: MS:1002675 should be 'cross-linking result details' instead of 'residue-pair-level global FDR' WARNING: XL:00001 should be 'BS3' instead of 'Xlink:BS3' WARNING: XL:00005 should be 'BS3:d4' instead of 'Xlink:BS3:d4' WARNING: XL:01000 should be 'BS3!Hydrolyzed' instead of 'Xlink:BS3!Hydrolyzed' WARNING: XL:01001 should be 'BS3!Amidated' instead of 'Xlink:BS3!Amidated' WARNING: XL:01008 should be 'BS3:d4!Hydrolyzed' instead of 'Xlink:BS3:d4!Hydrolyzed' WARNING: XL:01009 should be 'BS3:d4!Amidated' instead of 'Xlink:BS3:d4!Amidated'
After changes to the XLMOD CV, here is a revised list of CV issues with this file:
INFO: Validating file 'xiFDR-CrossLinkExample.mzid' ERROR: cvParam anchor protein should have a value, but it does not! ERROR: cvParam residue-pair-level global FDR has a value, but it should not! WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file' WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein' WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.' WARNING: MS:1002544 should be 'xi' instead of 'xiFDR' WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.' WARNING: MS:1002675 should be 'cross-linking result details' instead of 'residue-pair-level global FDR' WARNING: XL:00001 should be 'cross-linking entity' instead of 'Xlink:BS3' WARNING: XL:00005 should be 'homofunctional cross-linker' instead of 'Xlink:BS3:d4' WARNING: XL:01000 should be 'hydrolyzed BS3' instead of 'Xlink:BS3!Hydrolyzed' WARNING: XL:01001 should be 'amidated BS3' instead of 'Xlink:BS3!Amidated' WARNING: XL:01008 should be 'hydrolyzed BS3-d4' instead of 'Xlink:BS3:d4!Hydrolyzed' WARNING: XL:01009 should be 'amidated BS3-d4' instead of 'Xlink:BS3:d4!Amidated'
Should we allow also 'loop links', i.e. a cross-linking between the same peptide? Lutz has some of them in his example files, but Figure 4 in the spec doc states:
In mzIdentML, they will be represented by different ProteinDetectionHypothesis(PDH) elements within different ProteinAmbiguityGroup(PAG) elements, sharing the same ID and score.
@lutzfischer Can you update your xiFDR-CrossLinkExample.mzid so the CV term IDs and term names are correct
@germa - Loop links can be represented on peptides, and I think Lutz has some examples of these. At the protein-level, these could be represented as associations between different protein chains, if that is what the evidence supports. Others, correct me if I'm wrong
Latest validation run still shows all the above issues.
After update of the examples and the latest update of the validator (v1.4.23) the file seem to be ok now.
Only 3 Info messages are left:
Message 1:
Rule ID: SpectrumIdentificationList_may_rule
Level: INFO
Context(/cvParam/@accession ) in 2 locations
--> None of the given CvTerms were found at '/MzIdentML/DataCollection/AnalysisData/SpectrumIdentificationList/cvParam/@accession' because no values were found:
- Any children term of MS:1001184 (search statistics). The term can be repeated. The matching value has to be the identifier of the term, not its name.
Message 2:
Rule ID: SearchDatabase_rule
Level: INFO
Context(/searchDatabase/cvParam/@accession ) in 2 locations
--> None of the given CvTerms were found at '/MzIdentML/DataCollection/Inputs/searchDatabase/cvParam/@accession' because no values were found:
- Any children term of MS:1001011 (search database details). The term can be repeated. The matching value has to be the identifier of the term, not its name.
- Any children term of MS:1000561 (data file checksum type). The term can be repeated. The matching value has to be the identifier of the term, not its name.
Message 3:
Rule ID: SearchDatabaseDatabaseName_rule
Level: INFO
Context(/searchDatabase/databaseName/cvParam/@accession ) in 2 locations
--> None of the given CvTerms were found at '/MzIdentML/DataCollection/Inputs/searchDatabase/databaseName/cvParam/@accession' because no values were found:
- Any children term of MS:1001013 (database name). The term can be repeated. The matching value has to be the identifier of the term, not its name.
I assume that this is acceptable for now and I am closing the issue
The following CV problems persist in this file:
ERROR: cvParam anchor protein should have a value, but it does not! WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file' WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein' WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.' WARNING: MS:1002544 should be 'xi' instead of 'xiFDR' WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.'
should be fixed now
This one is still there..
Validating for conflicts with CV in file xiFDR-CrossLinkExample.mzid WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'cross-linked spectrum identification item'
ERROR: cvParam anchor protein should have a value, but it does not! ERROR: cvParam protein-pair-level global FDR has a value, but it should not! ERROR: cvParam residue-pair-level global FDR has a value, but it should not! WARNING: CV term MS:1002675 ('residue-pair-level global FDR') is not in the cv WARNING: CV term MS:1002676 ('protein-pair-level global FDR') is not in the cv WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file' WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein' WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.' WARNING: MS:1002544 should be 'xi' instead of 'xiFDR' WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.'