PRIDE-Archive / pride-curation-scripts

Useful PRIDE Pipelines curation scripts
0 stars 0 forks source link

An issue with the very bad file name - needs to be tested locally #13

Closed deeptijk closed 5 years ago

deeptijk commented 5 years ago

Ticket number - 1-20190116-152756-RESUB2

LSF report says -

File name #Proteins #Peptides #Spectra #Unique PTMs #Delta m/z % #Identified spectra #Missing identified spectra Missing identified spectrum ID Match fragment ions
PR461_PLIEF PR461_PL_IEF10_SPS_TMT.mzid 443 598 493 4 0% 487 0   true
PR461_PLIEF PR461_PL_IEF11_SPS_TMT.mzid 0 0 0 0 0% 0 0   true
PR461_PLIEF PR461_PL_IEF12_SPS_TMT.mzid 0 0 0 0 0% 0 0   true
PR461_PLIEF PR461_PL_IEF1_SPS_TMT.mzid 0 0 0 0 0% 0 0   true
PR461_PLIEF PR461_PL_IEF2_SPS_TMT.mzid 0 0 0 0 0% 0 0   true
PR461_PLIEF PR461_PL_IEF3_SPS_TMT.mzid 0 0 0 0 0% 0 0   true
PR461_PLIEF PR461_PL_IEF4_SPS_TMT.mzid 0 0 0 0 0% 0 0   true
PR461_PLIEF PR461_PL_IEF5_SPS_TMT.mzid 327 398 347 4 0% 347 0   true
PR461_PLIEF PR461_PL_IEF6_SPS_TMT.mzid 319 434 348 3 0% 345 0   true
PR461_PLIEF PR461_PL_IEF7_SPS_TMT.mzid 0 0 0 0 0% 0 0   true
PR461_PLIEF PR461_PL_IEF8_SPS_TMT.mzid 0 0 0 0 0% 0 0   true
PR461_PLIEF PR461_PL_IEF9_SPS_TMT.mzid 0 0 0 0 0% 0 0

So some has protein and peptide information and others doesn't.

sureshhewabi commented 5 years ago

This issue here is nothing to do with file name. However, some of the mzIdentML files contains invalid XML tags. For example, If you look at the validation report of individual mzIdentML file, Eg: /nfs/pride/prod/archive/1-20190116-152756-RESUB2/submitted/PR461_PL_IEF_ PR461_PL_IEF2_SPS_TMT.mzid

java.lang.IllegalStateException: Invalid ID in xml: <DBSequence length="0" searchDatabase_ref="unreferenced database" accession="sp|P24815|3BHS1_MOUSE3 beta-hydroxysteroid dehydrogenase/Delta 5-->

pst_prd@ebi-cli-003:~$ grep "accession=\"sp|P24815|3BHS1_MOUSE3" /nfs/pride/prod/archive/1-20190116-152756-RESUB2/submitted/PR461_PL_IEF_\ PR461_PL_IEF2_SPS_TMT.mzid
        <DBSequence length="0" searchDatabase_ref="unreferenced database" accession="sp|P24815|3BHS1_MOUSE3 beta-hydroxysteroid dehydrogenase/Delta 5-->4-isomerase type 1 OS=Mus musculus OX=10090 GN=Hsd3b1 PE=1 SV=3-DECOY" id="DBSeq_sp|P24815|3BHS1_MOUSE3 beta-hydroxysteroid dehydrogenase/Delta 5-->4-isomerase type 1 OS=Mus musculus OX=10090 GN=Hsd3b1 PE=1 SV=3-DECOY">

This part close the opened <DBSequence tag: Delta 5-->4-isomerase

deeptijk commented 5 years ago

Converting it into Partial Submission as requested by Submitter.