PeptideShakerCLI: java.lang.IllegalArgumentException: Not enough representative modification sites found.

shengbokw commented 4 years ago

Hi,

I've recently been using a series of tools from compomics and when exporting a cpsx file to a mzid file using PeptideShaker version 1.16.45, the export never succeeds if I use the CLI command line. So when I saw this issue, https://github.com/compomics/peptide-shaker/issues/208, I then decided to test peptideshaker version 2 and tried again. However, I encountered this problem as follows:

Sun Aug 09 20:12:15 BST 2020: PeptideShaker version 2.0.0-beta. Memory given to the Java virtual machine: 28631367680. Total amount of memory in the Java virtual machine: 2022178816. Free memory: 1969382904. Java version: 1.8.0_161. java.lang.IllegalArgumentException: Not enough representative modification sites found. at eu.isas.peptideshaker.ptm.ModificationLocalizationScorer.getRepresentativeToSecondaryMap(ModificationLocalizationScorer.java:1339) at eu.isas.peptideshaker.ptm.ModificationLocalizationScorer.scorePTMs(ModificationLocalizationScorer.java:875) at eu.isas.peptideshaker.ptm.ModificationLocalizationScorer.lambda$scorePeptidePtms$15(ModificationLocalizationScorer.java:1635) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1553) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at eu.isas.peptideshaker.ptm.ModificationLocalizationScorer.scorePeptidePtms(ModificationLocalizationScorer.java:1633) at eu.isas.peptideshaker.PeptideShaker.createProject(PeptideShaker.java:600) at eu.isas.peptideshaker.cmd.PeptideShakerCLI.createProject(PeptideShakerCLI.java:812) at eu.isas.peptideshaker.cmd.PeptideShakerCLI.call(PeptideShakerCLI.java:179) at eu.isas.peptideshaker.cmd.PeptideShakerCLI.main(PeptideShakerCLI.java:1014)

Can you help me with this problem?

Thanks very much!

Shengbo

shengbokw commented 4 years ago

Hello,

Actually I also tried a smaller dataset to run these pipeline and got a cpsx file using PeptideShakerCLI command line successfully, and then tried to use MzidCLI command line to export mzid file, but with an error of “java.lang.NullPointerException at eu.isas.peptideshaker.cmd.MzidCLIInputBean.<init (MzidCLIInputBean.java:139) at eu.isas.peptideshaker.cmd.MzidCLI.main(MzidCLI.java:276)”.

So I tried to open this cpsx.zip file in the PeptideShaker software, with the error of “no PeptideShaker project found in the zip file”, but it does exist it in the zip file.

Thanks

hbarsnes commented 4 years ago

Hi Shengbo,

I'm afraid that the "Not enough representative modification sites found" error is one of those remaining issues we still haven't gotten around to solving for the new beta release. The temporary fix is to do the following: Set the "Probabilistic Score" to "None" and "Confident Sites" under "Site Alignment" to "No" in the "PTM Localization" settings dialog (or in the IdentificationParametersCLI command line of course).

So I tried to open this cpsx.zip file in the PeptideShaker software, with the error of “no PeptideShaker project found in the zip file”, but it does exist it in the zip file.

I guess this is not the case, but please make sure that you are not mixing the output from the official release and the beta version, as these are not compatible. If the PeptideShaker project was created in the beta version the project file should now be called .psdb and not .cpsx. So if you have named your project file as ending with .cpsx it will not be detected when trying to unzip the zipped project.

An additional tip regarding the mzIdetML export, I would recommend that instead of using the MzidCLI command line separately you can simply add the MzidCL options directly to the PeptideShakerCLI command line (skipping the MzidCLI -in option). That way you do not have to first save the project and then reopen it to create the mzIdentML file.

Best regards, Harald

shengbokw commented 4 years ago

Hi Shengbo,

I'm afraid that the "Not enough representative modification sites found" error is one of those remaining issues we still haven't gotten around to solving for the new beta release. The temporary fix is to do the following: Set the "Probabilistic Score" to "None" and "Confident Sites" under "Site Alignment" to "No" in the "PTM Localization" settings dialog (or in the IdentificationParametersCLI command line of course).

So I tried to open this cpsx.zip file in the PeptideShaker software, with the error of “no PeptideShaker project found in the zip file”, but it does exist it in the zip file.

I guess this is not the case, but please make sure that you are not mixing the output from the official release and the beta version, as these are not compatible. If the PeptideShaker project was created in the beta version the project file should now be called .psdb and not .cpsx. So if you have named your project file as ending with .cpsx it will not be detected when trying to unzip the zipped project.

An additional tip regarding the mzIdetML export, I would recommend that instead of using the MzidCLI command line separately you can simply add the MzidCL options directly to the PeptideShakerCLI command line (skipping the MzidCLI -in option). That way you do not have to first save the project and then reopen it to create the mzIdentML file.

Best regards, Harald

Dear Harald,

Thanks very much! I will try your advice.

Best wishes, Shengbo

shengbokw commented 4 years ago

Hi Shengbo, I'm afraid that the "Not enough representative modification sites found" error is one of those remaining issues we still haven't gotten around to solving for the new beta release. The temporary fix is to do the following: Set the "Probabilistic Score" to "None" and "Confident Sites" under "Site Alignment" to "No" in the "PTM Localization" settings dialogue (or in the IdentificationParametersCLI command line of course).

So I tried to open this cpsx.zip file in the PeptideShaker software, with the error of “no PeptideShaker project found in the zip file”, but it does exist it in the zip file.

I guess this is not the case, but please make sure that you are not mixing the output from the official release and the beta version, as these are not compatible. If the PeptideShaker project was created in the beta version the project file should now be called .psdb and not .cpsx. So if you have named your project file as ending with .cpsx it will not be detected when trying to unzip the zipped project. An additional tip regarding the mzIdetML export, I would recommend that instead of using the MzidCLI command line separately you can simply add the MzidCL options directly to the PeptideShakerCLI command line (skipping the MzidCLI -in option). That way you do not have to first save the project and then reopen it to create the mzIdentML file. Best regards, Harald

Dear Harald,

Thanks very much! I will try your advice.

Best wishes, Shengbo

Dear Harald,

The tip you gave me earlier was very helpful, thanks, I've changed back to 1.16.45 running.

Recently, there was another problem that PeptideShaker encountered when I increased the input raw files from 5 to 15, the error is as follows:

Thu Aug 20 11:07:47 BST 2020 Removing Mapping Artifacts. Please Wait... Thu Aug 20 11:07:47 BST 2020 2032 unlikely protein mappings found: Thu Aug 20 11:07:47 BST 2020 - 4204 protein groups supported by non-enzymatic shared peptides. Thu Aug 20 11:07:47 BST 2020 - 1629 groups explained by a simpler group. Thu Aug 20 11:07:47 BST 2020 Note: a group can present combinations of these criteria. Thu Aug 20 11:07:47 BST 2020 Generating peptide map. Thu Aug 20 11:07:47 BST 2020 Filling Peptide Maps. Please Wait... 10% 20% 30% 40% Thu Aug 20 11:13:30 BST 2020 Computing peptide probabilities. Thu Aug 20 11:13:30 BST 2020 Estimating Probabilities. Please Wait... 10% 20% 30% 40% 50% 60% Thu Aug 20 11:13:30 BST 2020 Saving peptide probabilities. Thu Aug 20 11:13:30 BST 2020 Attaching Peptide Probabilities. Please Wait... 10% 20% 30% 40% 50% 60% 70% 80% 90% Thu Aug 20 11:13:31 BST 2020 Generating protein map. Thu Aug 20 11:13:31 BST 2020 Filling Protein Map. Please Wait... 10% 20% 30% 40% 50% 60% 70% 80% 90% Thu Aug 20 11:17:56 BST 2020 Resolving protein inference issues, inferring peptide and protein PI status. Thu Aug 20 11:17:56 BST 2020 Simplifying Redundant Protein Groups. Please Wait...

PeptideShaker processing failed. See the PeptideShaker log for details.

Thu Aug 20 11:34:30 BST 2020 An error occurred while loading the identification files: Thu Aug 20 11:34:30 BST 2020 A string constant starting with ''2PH3LS1:185:C4516ACXX:6:1101:10510:51019_1162-_cus_2PH3LS&' is too long.

I also checked the log details in PeptideShaker log:

java.sql.SQLException: A string constant starting with ''2PH3LS1:185:C4516ACXX:6:1101:10510:51019_1162-_cus_2PH3LS&' is too long. at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source) at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source) at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source) at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement.(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement20.(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement30.(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement40.(Unknown Source) at org.apache.derby.impl.jdbc.EmbedPreparedStatement42.(Unknown Source) at org.apache.derby.jdbc.Driver42.newEmbedPreparedStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedConnection.prepareStatement(Unknown Source) at com.compomics.util.db.ObjectsDB.updateObjectInDb(ObjectsDB.java:1350) at com.compomics.util.db.ObjectsDB.updateObject(ObjectsDB.java:1315) at com.compomics.util.db.ObjectsDB.updateObject(ObjectsDB.java:1285) at com.compomics.util.experiment.identification.IdentificationDB.updateProteinMatch(IdentificationDB.java:178) at com.compomics.util.experiment.identification.Identification.updateProteinMatch(Identification.java:830) at eu.isas.peptideshaker.protein_inference.ProteinInference.retainBestScoringGroups(ProteinInference.java:440) at eu.isas.peptideshaker.PeptideShaker.processIdentifications(PeptideShaker.java:448) at eu.isas.peptideshaker.fileimport.FileImporter$IdProcessorFromFile.importFiles(FileImporter.java:547) at eu.isas.peptideshaker.fileimport.FileImporter.importFiles(FileImporter.java:151) at eu.isas.peptideshaker.PeptideShaker.importFiles(PeptideShaker.java:229) at eu.isas.peptideshaker.cmd.PeptideShakerCLI.createProject(PeptideShakerCLI.java:748) at eu.isas.peptideshaker.cmd.PeptideShakerCLI.call(PeptideShakerCLI.java:181) at eu.isas.peptideshaker.cmd.PeptideShakerCLI.main(PeptideShakerCLI.java:957)

Do you know what causes this?

Thanks!

Shengbo Wang

hbarsnes commented 4 years ago

Hi Shengbo,

A string constant starting with ''2PH3LS1:185:C4516ACXX:6:1101:10510:51019_1162-_cus_2PH3LS&' is too long.

This would seem to indicate issues with putting your protein groups into the underlying database. May I ask what database you are using? And what kind of enzyme?

Best regards, Harald

shengbokw commented 4 years ago

Hi Shengbo,

A string constant starting with ''2PH3LS1:185:C4516ACXX:6:1101:10510:51019_1162-_cus_2PH3LS&' is too long.

This would seem to indicate issues with putting your protein groups into the underlying database. May I ask what database you are using? And what kind of enzyme?

Best regards, Harald

Hi Harald,

Thanks for your reply, I'm from the PRIDE team (Juan is my manager), and I'm currently re-analysis metaproteomics projects available on PRIDE. For this case, the protein database is from this link (https://www.ebi.ac.uk/pride/archive/projects/PXD005780), it's a Human Gut microbiome research, so the protein database was generated from metagenomics (with FragGeneScan I guess) and concatenated with uniport human proteins. I'm not sure what enzyme they used, but in the SerachGUI I used the Trypsin. Thanks!

Best regards, Shengbo

hbarsnes commented 4 years ago

Hi Shengbo,

I see that the database (without decoys) is close to 2.3 GB and with it being metaproteomics there will most likely be a lot of protein inference issues, which is what the error seems to indicate. Basically the combined list of accession numbers in a specific protein inference group is bigger than what the database can store.

Usually we would recommend using a smaller database, but I guess that is not really an option in your case. But perhaps you can go back to the beta versions and see if the same problem persists there as well? As we have done a lot of updates to better handle larger databases that could potentially address this particular issue as well.

Best regards, Harald

shengbokw commented 4 years ago

Hi Shengbo,

I see that the database (without decoys) is close to 2.3 GB and with it being metaproteomics there will most likely be a lot of protein inference issues, which is what the error seems to indicate. Basically the combined list of accession numbers in a specific protein inference group is bigger than what the database can store.

Usually we would recommend using a smaller database, but I guess that is not really an option in your case. But perhaps you can go back to the beta versions and see if the same problem persists there as well? As we have done a lot of updates to better handle larger databases that could potentially address this particular issue as well.

Best regards, Harald

Hey Harald,

Thanks very much for your advise. For some reason, I'm going with the more stable version for now (1.16.45), but will try and test the new version later.

I solved the problem: "A string constant starting with ''2PH3LS1:185:C4516ACXX:6:1101:10510:51019_1162-_cus_2PH3LS&' is too long." by adding an map table (mapping those long protein sequence names to well-designed IDs) before SearchGUI and recovering the sequence names after getting the results from PeptideShaker.

If I have any further problems with it (or with version 2), I'll get back to you! Thank you!

Best, Shengbo

hbarsnes commented 4 years ago

Hi Shengbo,

Great! I will then close the issue. Please open a new one if you come across other problem.

Best regards, Harald

compomics / peptide-shaker-2.0-issue-tracker

PeptideShakerCLI: java.lang.IllegalArgumentException: Not enough representative modification sites found. #61