compomics / compomics-utilities

Open source Java library for computational proteomics
http://compomics.github.io/projects/compomics-utilities.html
30 stars 17 forks source link

MGF reader #13

Closed wenbostar closed 8 years ago

wenbostar commented 8 years ago

Hi,

When I used the utilities 4.6.2 to read mgf data, I find that the number of MS/MS spectra is not the same with that from the function getNSpectra in SpectrumFactory. The code is shown below:

import com.compomics.util.experiment.massspectrometry.SpectrumFactory;

import java.io.File;
import java.io.IOException;

public class ReadMGF {
    public static void main(String[] args) throws IOException {
        String msfile = args[0];
        SpectrumFactory spectrumFactory = SpectrumFactory.getInstance();
        File mgfFile = new File(msfile);
        spectrumFactory.addSpectra(mgfFile, null);
        spectrumFactory.getNSpectra();
        System.out.println(spectrumFactory.getNSpectra());
        //spectrumFactory.clearFactory();

    }

}

The mgf file can be downloaded from the all_1.mgf The number of MS/MS spectra in this mgf : 17005 However, the value from the function getNSpectra is 17001.

Best regards! Bo

hbarsnes commented 8 years ago

Hi Bo,

Thanks for reporting this. It seems like there is a mismatch between the number of BEGIN IONS and TITLE lines in this specific mgf file. Not sure how that can happen, and haven't been able to locate where it happens yet. I will dig a bit more and get back to you.

Best regards, Harald

hbarsnes commented 8 years ago

Hi again,

Found the problem. For some reason four of your spectra

TITLE=Locus:11.1.1.4745.2 File:"iTRAQ86_CoutureP_11.wiff" TITLE=Locus:12.1.1.4259.2 File:"iTRAQ86_CoutureP_26.wiff" TITLE=Locus:9.1.1.5416.2 File:"iTRAQ86_CoutureP_09.wiff" TITLE=Locus:10.1.1.4356.2 File:"iTRAQ86_CoutureP_24.wiff"

ended the BEGIN IONS lines with \r, which confused our parser and resulted in it not recognizing the start of the spectrum and the number of spectra count was therefore lower than the number of title lines.

I've added a fix and will release a new version soon.

Best regards, Harald

hbarsnes commented 8 years ago

Hi Bo,

How did you generate the mgf file btw? Was Linux or Mac used? The only time I've seen this before is when a file was created on Linux and then moved to Windows (or the other way around) for processing. Is something like this the case for your mgf file? The strange thing here is that is only happens for some of the titles and not all of them. That part I don't understand...

Best regards, Harald

wenbostar commented 8 years ago

Hi Harald, Thanks a lot. The file was generated in Windows and was processed in Linux. However, when I use utilities 4.5.18, it works well. Best regards! Bo

hbarsnes commented 8 years ago

Hi Bo,

The issue should be fixed in version 4.6.3. If not, please let me know and I'll reopen the issue.

Best regards, Harald

wenbostar commented 8 years ago

Hi Harald, Thanks a lot. It works well for the last mgf file I sent to you. But it doesn't work for this mgf file: new.mgf.tar.gz

Best regards! Bo

hbarsnes commented 8 years ago

Hi Bo,

I don't see any problems with the second mgf file. What is the problem you are seeing?

Best regards, Harald

wenbostar commented 8 years ago

Hi Harald, The new version (4.6.3) works well for the second mgf file in Windows system, but the number of MS/MS spectra is not the same with that from the function getNSpectra in SpectrumFactory when I ran in Linux system. Best regards! Bo

hbarsnes commented 8 years ago

Hi Bo,

I don't have access Linux at the moment, so it would be great if you could try to debug this on your end and see if you can locate the problem? I'd recommend starting with the generation of the mgf index at line 207 onwards in com.compomics.util.experiment.io.massspectrometry.MgfReader.

Thanks in advance.

Best regards, Harald

wenbostar commented 8 years ago

Hi Harald, When I re-generated the index with version 4.6.3 in Linux, it worked well. Best regards! Bo

hbarsnes commented 8 years ago

Hi Bo,

Not sure why you closed the issue? Does it mean that you managed to get it to work in the latest version as well?

Best regards, Harald

wenbostar commented 8 years ago

Hi Harald, Yes, it works well in the latest version. Best regards! Bo