PRIDE-Archive / pride-curation-scripts

Useful PRIDE Pipelines curation scripts
0 stars 0 forks source link

Proteome Discoverer Issues #21

Open deeptijk opened 5 years ago

deeptijk commented 5 years ago

Issue 1. FILE: /nfs/pride/drop/pride-drop-006/jaiswal/ProteomeDiscovere/issue1.mzid

VALIDATION MESSAGE: cvc-identity-constraint.4.1: Duplicate unique value [SII_27_1] declared for identity constraint "PK_DATAADSILSIRSII" of element "MzIdentML".,

Ticket - 1-20190402-153477

Description- Single ID for 2 peptide_ref

Proteome Discoverer version 2.3

Issue 2.

FILE: /nfs/pride/drop/pride-drop-006/jaiswal/ProteomeDiscovere/issue2.mzid

VALIDATION MESSAGE: cvc-identity-constraint.4.1: Duplicate unique value [SII__698_8906_1] declared for identity constraint "PK_DATAADSILSIRSII" of element "MzIdentML".

Ticket - 1-20190307-51546-RESUB1

Description- Single ID for 2 peptide_ref

Proteome Discoverer 2.2

ypriverol commented 5 years ago

@deeptijk Please put the file related to this issue in a place that we can give it to Proteome Discover developers. Please let me know when all the issues are written to contact the Proteome Discover developers.

davco commented 5 years ago

Issue 3.

FILE: /nfs/pride/drop/pride-drop-006/jaiswal/ProteomeDiscovere/issue3.mzid

VALIDATION MESSAGE: Error message: cvc-identity-constraint.4.1: Duplicate unique value [SII__612_45874_1] declared for identity constraint "PK_DATAADSILSIR" of element "MzIdentML".

Ticket - 1-20190410-66337

Description- Single ID for 2 peptide_ref

Proteome Discoverer 2.2

davco6 commented 5 years ago

Issue 4 and user feedback about the error

FILE: /nfs/pride/drop/pride-drop-006/jaiswal/ProteomeDiscovere/issue4.mzid

VALIDATION MESSAGE: (1) Error message: cvc-identity-constraint.4.1: Duplicate unique value [SII__2168_10454_1] declared for identity constraint "PK_DATAADSILSIRSII" of element "MzIdentML".

Ticket 1-20190505-137711

Description- Single ID for 2 peptide_ref

Proteome Discoverer 2.2 and 2.3

User Feedback In the analysis Workflow within Proteome Discoverer, we used two software algorithms to identify peptides from the PSMs: Sequest HT and MS Amanda 2.0. Proteome Discoverer seems to handle these internally, but when it exports to the mzID file, it creates this error.

Here is an example SpectrumIdentificationItem. It has 5 unique peptide references and 3 unique IDs. From within Proteome Discoverer, I can tell that Amanda identified 3 possible peptides from the PSM and Sequest identified 2.

Amanda: id: SII__2162_19893_1 peptide_ref: -13_1608347_256970 experimentalMassToCharge: 464.332214355469 calculatedMassToCharge: 464.331265175 chargeState: 2

id:  SII__2162_19893_2
peptide_ref: -13_1608348_256970
experimentalMassToCharge: 464.332214355469
calculatedMassToCharge: 464.331265175
chargeState: 2

id:  SII__2162_19893_3
peptide_ref: -13_1608349_256970
experimentalMassToCharge: 464.332214355469
calculatedMassToCharge: 464.331265175
chargeState: 2

Sequest: id: SII__2162_19893_1 peptide_ref: -13_4132086_256970 experimentalMassToCharge: 464.332214355469 calculatedMassToCharge: 464.331265175 chargeState: 2

id:  SII__2162_19893_2
peptide_ref: -13_4132087_256970
experimentalMassToCharge: 464.332214355469
calculatedMassToCharge: 464.331265175
chargeState: 2

It seems to have created a unique id from the data produced by a single algorithm by appending a counter to the ID. From what I can tell, this ID consists of the raw file number (2162) plus the scan number (19893). However, it doesn't check the uniqueness of the of the id when it looks at data produced by a subsequent algorithm.

I don't know enough about this format to determine if this is a bug caused by Proteome Discoverer's data converter or a limitation of the mzID specifications.