Nesvilab / FragPipe

A cross-platform proteomics data analysis suite
http://fragpipe.nesvilab.org
Other
201 stars 38 forks source link

java.lang.NumberFormatException: Too many digits - Overflow #232

Closed Usman095 closed 4 years ago

Usman095 commented 4 years ago

Describe the bug When parsing the mzML file generated by msconvert from an mgf file, it throws the NumberFormatException. The error log is given below:

Selected fragment tolerance 0.02 Da.
883233774 fragments to be searched in 1 slices (13.16 GB total)
Operating on slice 1 of 1:
        Fragment index slice generated in 42.14 s
        001. combined-small2.mzML Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.simontuffs.onejar.Boot.run(Boot.java:340)
        at com.simontuffs.onejar.Boot.main(Boot.java:166)
Caused by: umich.ms.fileio.exceptions.FileParsingException: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: Too many digits - Overflow
        at umich.ms.fileio.filetypes.xmlbased.AbstractXMLBasedDataSource.parse(AbstractXMLBasedDataSource.java:198)
        at umich.ms.datatypes.scancollection.impl.ScanCollectionDefault.loadData(ScanCollectionDefault.java:807)
        at umich.ms.datatypes.scancollection.impl.ScanCollectionDefault.loadData(ScanCollectionDefault.java:791)
        at edu.umich.andykong.msfragger.b.a(Unknown Source)
        at edu.umich.andykong.msfragger.b.a(Unknown Source)
        at edu.umich.andykong.msfragger.D.a(Unknown Source)
        at edu.umich.andykong.msfragger.A.<init>(Unknown Source)
        at edu.umich.andykong.msfragger.MSFragger.b(Unknown Source)
        at edu.umich.andykong.msfragger.MSFragger.main(Unknown Source)
        ... 6 more
Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: Too many digits - Overflow
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:206)
        at umich.ms.fileio.filetypes.xmlbased.AbstractXMLBasedDataSource.parse(AbstractXMLBasedDataSource.java:191)
        ... 14 more
Caused by: java.lang.NumberFormatException: Too many digits - Overflow
        at javolution.text.TypeFormat.parseDouble(TypeFormat.java:518)
        at javolution.text.TypeFormat.parseDouble(TypeFormat.java:579)
        at javolution.text.CharArray.toDouble(CharArray.java:481)
        at umich.ms.fileio.filetypes.mzml.MZMLMultiSpectraParser.tagPrecursorStart(MZMLMultiSpectraParser.java:286)
        at umich.ms.fileio.filetypes.mzml.MZMLMultiSpectraParser.call(MZMLMultiSpectraParser.java:168)
        at umich.ms.fileio.filetypes.mzml.MZMLMultiSpectraParser.call(MZMLMultiSpectraParser.java:64)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

To Reproduce Convert mgf file to mzML using msconvert. The mgf file has the following fields: BEGIN IONS TITLE=20111219_EXQ5_KiSh_SA_LabelFree_HeLa_Proteome_Control_rep1_pH4.5.5.3 RTINSECONDS=1.3906311 PEPMASS=497.554892607892 695441.72387700004 CHARGE=3+ ... Run the philosopher pipeline using the Non-ion mobility data script provided here for closed search.

Expected behavior The program should finish without any errors.

Please complete the following information:

Additional context For search, the human database was used which can be downloaded using FragPipe.

fcyu commented 4 years ago

Hi @Usman095 ,

Neither of those two numbers (497.554892607892 or 695441.72387700004) will cause overflow. Can you paste the corresponding entry from the converted mzML?

Thank,

Fengchao

Usman095 commented 4 years ago

The mzML file that I have is more than 10GBs. Is there a way to know which entry caused the error in mzML file?

fcyu commented 4 years ago

Sorry, I though you already identify the location in MGF file since you send us this

To Reproduce Convert mgf file to mzML using msconvert. The mgf file has the following fields: BEGIN IONS TITLE=20111219_EXQ5_KiSh_SA_LabelFree_HeLa_Proteome_Control_rep1_pH4.5.5.3 RTINSECONDS=1.3906311 PEPMASS=497.554892607892 695441.72387700004 CHARGE=3+ ...

If you cannot identify the location, there is no way for us to reproduce the error.

Best,

Fengchao

Usman095 commented 4 years ago

Here's the mzML file that I used: https://fiudit-my.sharepoint.com/:u:/g/personal/mutariq_fiu_edu/EQLlewA2O0NItbB71Rd-5SABmZgWpEkAuhaoEz8-C7-J7w?e=oM0bcN

fcyu commented 4 years ago

Thank you very much. We will take a look soon.

Best,

Fengchao

fcyu commented 4 years ago

Hi @Usman095 ,

I confirmed that this is a bug in data loading. It will be fixed in the next MSFragger release.

Thanks,

Fengchao

Usman095 commented 4 years ago

Awesome! Thank you!

Usman095 commented 4 years ago

Is there a quick fix that I can apply to get around the bug?

fcyu commented 4 years ago

You can round the numbers in your MGF to, for example, 6 decimal points.