compomics / peptide-shaker

Interpretation of proteomics identification results
http://compomics.github.io/projects/peptide-shaker.html
48 stars 19 forks source link

Importing Proteome Discoverer 2.4 SP1 mzid files into PeptideShaker #396

Closed PRprog closed 4 years ago

PRprog commented 4 years ago

Hi, my issue is about not being able to import the mzid data (3 files) with mzML spectra (also 3 files), both exported from Proteome Discoverer 2.4 SP1, into PeptideShaker. The import ends up with the error message: "Importing Data Canceled!" 200409_report.txt The log file: 200409_PSlog.txt I do not know whether there is some issue with the exported mzid format from Proteome Discoverer or anywhere else. The background for this problem is that I am trying to submit the data to PRIDE archive - I need the mzident file and I am trying to find another way to obtain the mzid file, because the mzid file exported by Proteome Discoverer provide some error message when trying to import to PRIDE. So I was trying to get my files into PeptideShaker in order to export the mzid file out of this tool and to use it for PRIDE submission. I could share my data for testing as far as they are not publicly available now. Best regards, Pavel

hbarsnes commented 4 years ago

Hi Pavel,

The error you are seeing is due to the current release of PeptideShaker only supporting spectrum files in the mgf format. Support for mzML has just been added in a beta release we are working on, but I cannot at this time tell you when this version will become available.

Additionally, while mzid files are supported as input, we usually have to make some minor tweaks when loading mzid files from sources we have not previously worked with, and I'm pretty sure we have not tried to load PD mzid files before. But if you share your data I'd be more than happy to check whether it can be loaded in the upcoming beta release of PeptideShaker and make any changes if required. You can send them to me at harald.barsnes@gmail.com.

However, note that if you load mzid data in PeptideShaker it will be reprocessed, meaning that the results will not be identical to what was in your original mzid files. So if you are doing this as part of sharing your search results this may not be what you want?

You are perhaps therefore better off trying to fix the issues with your original mzid files. What sort of errors did you get when trying to upload to PRIDE?

Best regards, Harald

PRprog commented 4 years ago

Dear Harald,

thank you for your response. Yes, I was trying the result processing with mzML files, but previously I have also tried to use the mgf files. The error was like this

Spectrum 'temp' not found in file 12042019_PR_MO-TiO2frac-sk1_fr_pur.mgf.

200409_mgf_report.txt I do not know what the "temp" spectrum means, I have not found anything like this in the mzid file.

I will send the data to you to your email address together with a more detailed description of the whole story. You are right I do not intend to change my processed results and to reprocess them again. I just need a correct mzid file for PRIDE submission and I do not know how to obtain it.

The errors obtained with the px-submission tool are with the mzid file and mostly it tells me that

Error message: Duplicate unique value [SII_237_1] declared for identity constraint of element "MzIdentML".,Non-fatal XML Parsing error detected on line 215113

or

Duplicate unique value [SII_290_1] declared for identity constraint "PK_DATAADSILSIR" of element "MzIdentML"., (3) Non-fatal XML Parsing error detected on line 264030, (4) Error message: cvc-identity-constraint.4.1:

I do not really know what this mean. Additionaly, when I finished with the submission with px-submission tool, there was no ID nubmer in the px-submission window and the button "Finish" was not active, although all files were already uploaded. I had to submit my data using command line (after contacting PRIDE support), because to wait another 20 hours to end up with nothing is really sad. Thank you!

Best regards Pavel

hbarsnes commented 4 years ago

Hi Pavel,

I do not know what the "temp" spectrum means, I have not found anything like this in the mzid file.

This error is due to the mzid files being created with mzML as the spectrum format and not with mgf. Providing (related) mgf files instead of the original mzML files will not work.

The errors obtained with the px-submission tool are with the mzid file and mostly it tells me that …

I'm afraid the "duplicate unique value" errors indicate errors in the mzid files which I don't think it will be possible to fix after the mzid files have been created. Basically, the duplicate keys make it impossible to tell which mappings are the correct ones.

So there seems to be a bug in the Proteome Discoverer mzid export. Hopefully they will be able to correct it quickly and allow you to generate valid mzid files.

I haven't yet had a look at your data (it is still downloading), but besides waiting for a Proteome Discoverer fix your best shot is perhaps to find a different tool to convert your msf files to mzid? Which tools have you tried so far? I'm not too familiar with converting msf files, but you can perhaps have a look at M2Lite (https://bitbucket.org/paiyetan/m2lite/wiki/Home) or ProCon (https://www.ruhr-uni-bochum.de/mpc/software/ProCon)?

By the way - is there any way to use the PeptideShaker/SearchGUI for label-free quantitation?

Yes, you can extend the pipeline with either FlashLFQ (https://github.com/smith-chem-wisc/FlashLFQ) or moFF (https://github.com/compomics/moFF).

I'm sorry to hear about your bad experience with sharing your proteomics data btw. I agree with you, the sharing ought to be straightforward and easy. And while it certainly has gotten a lot better recently there can clearly sometimes still be issues. Hopefully it will be easier next time around!

Best regards, Harald

hbarsnes commented 4 years ago

Hi again,

I had a look at your mzid files and they are indeed problematic.

Here's one example:

<SpectrumIdentificationResult spectraData_ref="ID_MZML_FILE_with_spectra" spectrumID="scan=237" id="SIR_237">
     <SpectrumIdentificationItem [...] id="SII_237_1">
       [...]
     </SpectrumIdentificationItem>
     <SpectrumIdentificationItem [...] id="SII_237_1">
       [...]
     </SpectrumIdentificationItem>
     [...]
</SpectrumIdentificationResult>

Notice how the id for both SpectrumIdentificationItems are identical. The id's have to be unique and when not, the mzid file is not valid. I'm afraid there is no way of fixing this after the file as been created. The only option is to recreate the file as soon as Proteome Discoverer fixes the bug(s).

On a related note the file is also missing the annotation of some of the search settings used, for example the list of PTMs and the mass tolerances.

The good news, sort of, is that the files (mzid + mzml) can be loaded in the beta version of PeptideShaker. However, as already mentioned, this does not make a lot of sense for you as it would change your results. And in that case, you'd probably be better off researching the data in SearchGUI first anyway.

I'm sorry I could not be of more help. Hopefully the msf to mzid converters I suggested can generate better mzid files than Proteome Discoverer can...

Best regards, Harald

PRprog commented 4 years ago

Dear Harald,

thank you for your comments and suggestions! I really appreciate it.

I do not know how the Proteome Discoverer creates the mzid files, but I think you are right that the source of mzid are the mzML files, because there was a comment in the export dialog window that

For obtaining matching spectra data for this mzIdentML export you need to create an mzML file from the current file. ...

So this excludes my not very clever idea of obtaining mzid file via PeptideShaker :-(

Thank you for your comment about the "duplicate unique values". I was afraid that this may relate to a bug in Proteome Discoverer. I will try to communicate with PD support further in order to solve this issue for the future, because this could make the complete submission much easier for everyone using the Proteome Discoverer software.

Regarding the other options - I have also found these two tools you have suggested. I downloaded the ProCon program, tried to start (after modification of the path for JRE in the bat file) and I have obtained the following:

C:\DBsearch\ProCon_dist-0.9.804>REM for Java 13 and higher C:\DBsearch\ProCon_dist-0.9.804>REM set JAVA_HOME="E:\Programs\openjdk-13.0.1_windows-x64_bin\jdk-13.0.1\bin\java.exe" C:\DBsearch\ProCon_dist-0.9.804>set JAVA_HOME="C:\Program Files\Java\jre1.8.0_231\bin\java.exe" C:\DBsearch\ProCon_dist-0.9.804>"C:\Program Files\Java\jre1.8.0_231\bin\java.exe" -cp .\lib -jar ./ProCon.jar Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.UnsupportedClassVersionError: de/mpc/ProCon/ProCon has been compiled by a more recent version of the Java Runtime (class file version 57.0), this version of the Java Runtime only recognizes class file versions up to 52.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(Unknown Source) at java.security.SecureClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.defineClass(Unknown Source) at java.net.URLClassLoader.access$100(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.launcher.LauncherHelper.checkAndLoadMain(Unknown Source) C:\DBsearch\ProCon_dist-0.9.804>pause Press any key to continue . . .

And this is unfortunately my stop point - I do not know how to solve it, because my JRE is relatively up to date. And also the program is intended for conversion of msf files from previous versions of Proteome Discoverer which may follow in another issue in the future. But yes - I would have used it if it worked.

I have also tried to run the M2Lite program, but it did not work when I tried it first. I may have to be more patient with the proper settings then. I will probably give a try once again.

Thanks for the tips to LFQ in PeptideShaker, I hope I will be able to run it one day successfully. The thing is that I like the idea of PeptideShaker and SearchGUI, but I am missing the quantitation opportunities in these program which is a really important thing for me. So I hope the LFQ may work in here.

Now I also noticed your last comment - thank you. I think this is important to know in order to ask for a bug fix in PD. OK, I will try the M2Lite if it works.

I guess I might really have a bad luck with the submission of our proteomics data this time. Hopefully it will be much better in the future!

Best regards, Pavel

hbarsnes commented 4 years ago

Hi Pavel,

C:\DBsearch\ProCon_dist-0.9.804>"C:\Program Files\Java\jre1.8.0_231\bin\java.exe" -cp .\lib -jar ./ProCon.jar Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.UnsupportedClassVersionError: de/mpc/ProCon/ProCon has been compiled by a more recent version of the Java Runtime (class file version 57.0), this version of the Java Runtime only recognizes class file versions up to 52.0

This indicates that you are still running Java 8 and not Java 13. Try changing your command line to:

C:\DBsearch\ProCon_dist-0.9.804>"E:\Programs\openjdk-13.0.1_windows-x64_bin\jdk-13.0.1\bin\java.exe" -cp .\lib -jar ./ProCon.jar

However, it seems like it would not help anyway:

image

Which only brings you back to the original problem...

I have also tried to run the M2Lite program, but it did not work when I tried it first. I may have to be more patient with the proper settings then. I will probably give a try once again.

Yes, this one seemed more tricky indeed. Having to install R and specific R packages etc. Hence, I did not give this one a go myself.

Seems like most/all of the msf to mzid converters stopped being used/maintained as soon as PD started exporting their own mzid files. Which would have been fine as long as the PD-exported mzid files were actually valid...

Anyway, good luck with all of this! I hope you will be able to create valid mzid files in the end!

Best regards, Harald

PS: I'm now closing this issues, as it is not really a PeptideShaker issue anymore. But please leave a message here anyway if you manage to find a solution in the end. As it can potentially be useful for others as well. :)

ciesinsk commented 4 years ago

Hi.

I am sorry to reopen this thread, but I can contribute to this. The error is indeed, that the ids for the SII is not unique. Internally, the Id is composed out of the actual scan id and the search engine rank. With the new precursor detector node (which leads essentially to a chimeric spectra search) there can be two rank-one results in the same spectrum. Here, it leads both times to the same search result. This is indeed a bug that needs to be fixed in PD. My guess is that w/o the precursor detector node you can submit your data.

Kind regards, Frank.