HUPO-PSI / mzidentml-validator

mzidentml validator ui and command line tool
Apache License 2.0
0 stars 0 forks source link

Validation issues for xiFDR-CrossLinkExample.mzid #8

Open edeutsch opened 8 years ago

edeutsch commented 8 years ago

ERROR: cvParam anchor protein should have a value, but it does not! ERROR: cvParam protein-pair-level global FDR has a value, but it should not! ERROR: cvParam residue-pair-level global FDR has a value, but it should not! WARNING: CV term MS:1002675 ('residue-pair-level global FDR') is not in the cv WARNING: CV term MS:1002676 ('protein-pair-level global FDR') is not in the cv WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file' WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein' WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.' WARNING: MS:1002544 should be 'xi' instead of 'xiFDR' WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.'

edeutsch commented 8 years ago

Validation errors found in today's version: WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file' WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein' WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.' WARNING: MS:1002544 should be 'xi' instead of 'xiFDR' WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.' WARNING: MS:1002675 should be 'cross-linking result details' instead of 'residue-pair-level global FDR' WARNING: XL:00001 should be 'BS3' instead of 'Xlink:BS3' WARNING: XL:00005 should be 'BS3:d4' instead of 'Xlink:BS3:d4' WARNING: XL:01000 should be 'BS3!Hydrolyzed' instead of 'Xlink:BS3!Hydrolyzed' WARNING: XL:01001 should be 'BS3!Amidated' instead of 'Xlink:BS3!Amidated' WARNING: XL:01008 should be 'BS3:d4!Hydrolyzed' instead of 'Xlink:BS3:d4!Hydrolyzed' WARNING: XL:01009 should be 'BS3:d4!Amidated' instead of 'Xlink:BS3:d4!Amidated'

edeutsch commented 8 years ago

After changes to the XLMOD CV, here is a revised list of CV issues with this file:

INFO: Validating file 'xiFDR-CrossLinkExample.mzid' ERROR: cvParam anchor protein should have a value, but it does not! ERROR: cvParam residue-pair-level global FDR has a value, but it should not! WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file' WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein' WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.' WARNING: MS:1002544 should be 'xi' instead of 'xiFDR' WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.' WARNING: MS:1002675 should be 'cross-linking result details' instead of 'residue-pair-level global FDR' WARNING: XL:00001 should be 'cross-linking entity' instead of 'Xlink:BS3' WARNING: XL:00005 should be 'homofunctional cross-linker' instead of 'Xlink:BS3:d4' WARNING: XL:01000 should be 'hydrolyzed BS3' instead of 'Xlink:BS3!Hydrolyzed' WARNING: XL:01001 should be 'amidated BS3' instead of 'Xlink:BS3!Amidated' WARNING: XL:01008 should be 'hydrolyzed BS3-d4' instead of 'Xlink:BS3:d4!Hydrolyzed' WARNING: XL:01009 should be 'amidated BS3-d4' instead of 'Xlink:BS3:d4!Amidated'

germa commented 8 years ago

Should we allow also 'loop links', i.e. a cross-linking between the same peptide? Lutz has some of them in his example files, but Figure 4 in the spec doc states:

In mzIdentML, they will be represented by different ProteinDetectionHypothesis(PDH) elements within different ProteinAmbiguityGroup(PAG) elements, sharing the same ID and score.

andrewrobertjones commented 8 years ago

@lutzfischer Can you update your xiFDR-CrossLinkExample.mzid so the CV term IDs and term names are correct

andrewrobertjones commented 8 years ago

@germa - Loop links can be represented on peptides, and I think Lutz has some examples of these. At the protein-level, these could be represented as associations between different protein chains, if that is what the evidence supports. Others, correct me if I'm wrong

edeutsch commented 8 years ago

Latest validation run still shows all the above issues.

lutzfischer commented 8 years ago

After update of the examples and the latest update of the validator (v1.4.23) the file seem to be ok now.

Only 3 Info messages are left:

Message 1:
    Rule ID: SpectrumIdentificationList_may_rule
    Level: INFO
    Context(/cvParam/@accession ) in 2 locations
    --> None of the given CvTerms were found at '/MzIdentML/DataCollection/AnalysisData/SpectrumIdentificationList/cvParam/@accession' because no values were found:
  - Any children term of MS:1001184 (search statistics). The term can be repeated. The matching value has to be the identifier of the term, not its name.

Message 2:
    Rule ID: SearchDatabase_rule
    Level: INFO
    Context(/searchDatabase/cvParam/@accession ) in 2 locations
    --> None of the given CvTerms were found at '/MzIdentML/DataCollection/Inputs/searchDatabase/cvParam/@accession' because no values were found:
  - Any children term of MS:1001011 (search database details). The term can be repeated. The matching value has to be the identifier of the term, not its name.
  - Any children term of MS:1000561 (data file checksum type). The term can be repeated. The matching value has to be the identifier of the term, not its name.

Message 3:
    Rule ID: SearchDatabaseDatabaseName_rule
    Level: INFO
    Context(/searchDatabase/databaseName/cvParam/@accession ) in 2 locations
    --> None of the given CvTerms were found at '/MzIdentML/DataCollection/Inputs/searchDatabase/databaseName/cvParam/@accession' because no values were found:
  - Any children term of MS:1001013 (database name). The term can be repeated. The matching value has to be the identifier of the term, not its name.

I assume that this is acceptable for now and I am closing the issue

edeutsch commented 8 years ago

The following CV problems persist in this file:

ERROR: cvParam anchor protein should have a value, but it does not! WARNING: MS:1000563 should be 'Thermo RAW format' instead of 'Thermo Raw file' WARNING: MS:1002404 should be 'count of identified proteins' instead of 'count of identified protein' WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'Cross-linked spectrum identification item.' WARNING: MS:1002544 should be 'xi' instead of 'xiFDR' WARNING: MS:1002545 should be 'xi:score' instead of 'The xi result 'Score'.'

lutzfischer commented 8 years ago

should be fixed now

edeutsch commented 8 years ago

This one is still there..

Validating for conflicts with CV in file xiFDR-CrossLinkExample.mzid WARNING: MS:1002511 should be 'cross-link spectrum identification item' instead of 'cross-linked spectrum identification item'