Fail to parse various mzid files

JB-MS commented 7 years ago

Dear pzmzml-integrator-Team,

i have just found your repo here and i am very excited about the mzid parser, since i am looking for a lightweight python mzid parser.

I have a very simple script using the Mzid class but unfortunately it crashes with my test files (3 different: msgf+ result file, myrimatch result file and an example file from the PRIDE toolsuite) with different errors.

e.g. " Reading mzID file as document object model... Done. Processing time: 1.6 seconds. Processing 0 of 191 records (PSM). Traceback (most recent call last): File "./simple_mzid_parser.py", line 25, in main() File "./simple_mzid_parser.py", line 15, in main reader = Mzid(sys.argv[1]) File "/Users/joe/Dev/pymzml_integrator/parse_mzid.py", line 46, in init self.psm_df = self.read_psm() File "/Users/joe/Dev/pymzml_integrator/parse_mzid.py", line 137, in read_psm value = cvParam.attributes['value'].value File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/xml/dom/minidom.py", line 555, in getitem return self._attrs[attname_or_tuple] KeyError: 'value' "

or

" Traceback (most recent call last): File "./simple_mzid_parser.py", line 25, in main() File "./simple_mzid_parser.py", line 15, in main reader = Mzid(sys.argv[1]) File "/Users/joe/Dev/pymzml_integrator/parse_mzid.py", line 47, in init self.peptide_df = self.read_peptide() File "/Users/joe/Dev/pymzml_integrator/parse_mzid.py", line 246, in readpeptide newcol[i] = re.sub(' |:', '', peptide_df[i][0][1]) TypeError: 'NoneType' object is not subscriptable "

Any help is highly appreciated.

Cheers, Johannes

PS: Thanks for using pymzML :)

ed-lau commented 7 years ago

Hi Johannes,

Thanks for your email. The mzid parser works for my own personal purpose (thermo raw, searching against uniprot with crux). I would be interested in picking it up again and developing it to be something that might be useful to others. I can look into the error messages and It would be very helpful if you could send me any mzid files you have. As far as I can tell there are quite a lot of many-to-many relationships among terms (from different search engines and experimental designs) that I should be aware of.

Best wishes, Edward

— Edward Lau, PhD Postdoctoral Fellow Stanford Cardiovascular Institute lau1@stanford.edu

On Mar 27, 2017, at 7:49 AM, Johannes Leufken notifications@github.com wrote:

Dear pzmzml-integrator-Team,

i have just found your repo here and i am very excited about the mzid parser, since i am looking for a lightweight python mzid parser.

I have a very simple script using the Mzid class but unfortunately it crashes with my test files (3 different: msgf+ result file, myrimatch result file and an example file from the PRIDE toolsuite) with different errors.

e.g. " Reading mzID file as document object model... Done. Processing time: 1.6 seconds. Processing 0 of 191 records (PSM). Traceback (most recent call last): File "./simple_mzid_parser.py", line 25, in main() File "./simple_mzid_parser.py", line 15, in main reader = Mzid(sys.argv[1]) File "/Users/joe/Dev/pymzml_integrator/parse_mzid.py", line 46, in init self.psm_df = self.read_psm() File "/Users/joe/Dev/pymzml_integrator/parse_mzid.py", line 137, in read_psm value = cvParam.attributes['value'].value File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/xml/dom/minidom.py", line 555, in getitem return self._attrs[attname_or_tuple] KeyError: 'value' "

or

" Traceback (most recent call last): File "./simple_mzid_parser.py", line 25, in main() File "./simple_mzid_parser.py", line 15, in main reader = Mzid(sys.argv[1]) File "/Users/joe/Dev/pymzml_integrator/parse_mzid.py", line 47, in init self.peptide_df = self.read_peptide() File "/Users/joe/Dev/pymzml_integrator/parse_mzid.py", line 246, in readpeptide newcol[i] = re.sub(' |:', '', peptide_df[i][0][1]) TypeError: 'NoneType' object is not subscriptable "

Any help is highly appreciated.

Cheers, Johannes

PS: Thanks for using pymzML :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ed-lau/pymzml_integrator/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AH3gfQ7mmXsSZoPr-PECELzwi0gK2xiYks5rp8yJgaJpZM4MqYEq.

ed-lau commented 7 years ago

New commit should catch most exceptions:

the MSGF+ test file appears to have no ProteinDetectionList field as specified on the mzid format documentation.
The Mascot test file had some nested cvParam nodes that were not dealt with. They are now ignored.

To run directly:

python ./parse_mzid.py external_mzid_test/mzidentml-example.mzid --out=mascot.txt

JB-MS commented 7 years ago

Hi Ed,

thanks a lot for the fixes! It works now fine for my files.

Cheers, Johannes

ed-lau / riana

Fail to parse various mzid files #1