compomics / reporter

Protein quantification based on reporter ions
http://compomics.github.io/projects/reporter.html
4 stars 2 forks source link

linux command-line requires data folder #14

Open jcanderan opened 3 years ago

jcanderan commented 3 years ago

Hello,

When importing a psdb in linux (command line), it's required that the data folder inside searchgui_out.zip is present.

For example, if the psdb file is at /data/test, there needs to be a folder /data/test/data containing the files from the data folder inside searchgui_out.zip

There also doesn't seem to be an option to do -identification_files to select the searchgui_out.zip as with peptideshaker.

Thanks!

hbarsnes commented 3 years ago

The data folder has to be either inside the zip file (which can be done by adding the -output_data to the SearchGUI command line), or Reporter should look for the data in the original folder used as input to SearchGUI.

There also doesn't seem to be an option to do -identification_files to select the searchgui_out.zip as with peptideshaker.

Correct, as the input to Reporter is the output from PeptideShaker after processing the identification files, thus the identification files themselves are no longer needed at this stage.

jcanderan commented 3 years ago

Unless Reporter explicitly has that data folder, it will not run.

Without ./data folder from searchgui_out.zip (I renamed it to datax):

java -cp Reporter-0.8.5.jar eu.isas.reporter.cli.ReporterCLI -id /data/data/jc/pnnl/out/folder/profile/file.psdb
Mon May 24 11:37:16 UTC 2021 An error occurred while reading:
/data/data/jc/pnnl/out/folder/profile/file.psdb.

Please verify that the Reporter version used to create
the file is compatible with your version of Reporter.

With data folder (partial output):

java -cp Reporter-0.8.5.jar eu.isas.reporter.cli.ReporterCLI -id /data/data/jc/pnnl/out/folder/profile/file.psdb
Mon May 24 11:38:14 UTC 2021 Loading FASTA File. Please Wait...
Mon May 24 11:38:14 UTC 2021 Loading Spectrum Files. Please Wait...
Mon May 24 11:38:14 UTC 2021 Loading Spectrum Files (1 of 1). Please Wait...
Mon May 24 11:38:14 UTC 2021 Inferring Quantification Parameters. Please Wait...
Mon May 24 11:38:14 UTC 2021 Peptide Ratio Normalization. Please Wait...

searchgui_out.zip is also present in the folder and the original files used to run SearchGUI are in their original locations.

Incidentally, the same issue prevents Peptideshaker on Windows from loading .psdb files generated on linux. You have to have the ./data folder or it won't load (same error message) (it may happen in other cases but this is the most common for me)

hbarsnes commented 3 years ago

Can you share the Reporter log file? You'll find it via the Reporter Welcome dialog > Settings & Help > Help > Bug Report.

jcanderan commented 3 years ago

I wonder if there's a series of related issues.

If I run PeptideShakerCLI (command-line) without the -fasta_file option, it will load searchgui_out.zip and process it and save the psdb without issue, but then when I try to run ReportCLI (command-line), it can't find the FASTA file with the following message in Peptideshaker.log:

java.io.IOException: FASTA file not found /data/data/jc/dd/out/dd/profile/.PeptideShaker_unzip_temp/searchgui_out_PeptideShaker_temp/data/jc_dd_profile_crap_concatenated_target_decoy.fasta.
        at eu.isas.peptideshaker.utils.PsdbParent.loadPsdbFile(PsdbParent.java:299)
        at eu.isas.peptideshaker.cmd.ReportCLI.call(ReportCLI.java:95)
        at eu.isas.peptideshaker.cmd.ReportCLI.main(ReportCLI.java:366)

For the Reporter error it is similar:

java.io.IOException: FASTA file not found /data/data/jc/dd/out/dd/profile/.PeptideShaker_unzip_temp/searchgui_out_PeptideShaker_temp/data/jc_dd_profile_crap_concatenated_target_decoy.fasta.
        at eu.isas.peptideshaker.utils.PsdbParent.loadPsdbFile(PsdbParent.java:299)
        at eu.isas.reporter.io.ProjectImporter.importPeptideShakerProject(ProjectImporter.java:119)
        at eu.isas.reporter.cli.ReporterCLI.call(ReporterCLI.java:407)
        at eu.isas.reporter.cli.ReporterCLI.main(ReporterCLI.java:845)

So clearly it expects to find the FASTA in a temp folder that no longer exists because it expects it to be in the directory created by extracting searchgui_out.zip, but that directory is removed once PeptideShakerCLI is done and thus not available for ReportCLI.

Note that Reporter doesn't work even if I used the -fasta_file option to generate the psdb originally but ReportCLI for peptideshaker will work. However, the error message for Reporter changes:

Mon May 24 16:17:13 UTC 2021 Loading FASTA File. Please Wait...
Mon May 24 16:17:13 UTC 2021 Loading Spectrum Files. Please Wait...
Mon May 24 16:17:13 UTC 2021 Loading Spectrum Files (1 of 1). Please Wait...
Mon May 24 16:17:13 UTC 2021 An error occurred while reading:
/data/data/jc/dd/out/dd/profile/dd.psdb.
java.lang.NullPointerException
        at eu.isas.peptideshaker.utils.PsdbParent.loadSpectrumFile(PsdbParent.java:526)
        at eu.isas.reporter.io.ProjectImporter.importPeptideShakerProject(ProjectImporter.java:170)
        at eu.isas.reporter.cli.ReporterCLI.call(ReporterCLI.java:407)
        at eu.isas.reporter.cli.ReporterCLI.main(ReporterCLI.java:845)
hbarsnes commented 3 years ago

If you want to use the PeptideShakerCLI output as input to ReportCLI, please use the -zip option when running PeptideShakerCLI. This will ensure that all of the needed files are included in the zip file.

However, if you know that you want to export the reports when processing data in PeptideShaker, I would rather recommend adding the ReportCLI options directly to the PeptideShakerCLI command line. This is much faster, as there is no need to reopen the project, and has the advantage that all of the required files will always be available.

As for Reporter, I would recommend using the -zip option in PeptideShaker there as well.

jcanderan commented 3 years ago

I didn't know that regarding Peptideshaker although I prefer to keep the commands separate for other reasons.

However, if I try loading a zip into Reporter (which should be supported), it doesn't work either.

From the command line message showing it should be supported: -id The PeptideShaker project (.psdb or .zip).

So then:

java -cp Reporter-0.8.5.jar eu.isas.reporter.cli.ReporterCLI -id /data/data/jc/dd/out/dd/profile/dd.zip

org.sqlite.SQLiteException: [SQLITE_NOTADB]  File opened that is not a database file (file is encrypted or is not a database)
        at org.sqlite.core.DB.newSQLException(DB.java:909)
        at org.sqlite.core.DB.newSQLException(DB.java:921)
        at org.sqlite.core.DB.throwex(DB.java:886)
        at org.sqlite.core.NativeDB.prepare_utf8(Native Method)
        at org.sqlite.core.NativeDB.prepare(NativeDB.java:127)
        at org.sqlite.core.DB.prepare(DB.java:227)
        at org.sqlite.core.CorePreparedStatement.<init>(CorePreparedStatement.java:41)
        at org.sqlite.jdbc3.JDBC3PreparedStatement.<init>(JDBC3PreparedStatement.java:30)
        at org.sqlite.jdbc4.JDBC4PreparedStatement.<init>(JDBC4PreparedStatement.java:19)
        at org.sqlite.jdbc4.JDBC4Connection.prepareStatement(JDBC4Connection.java:48)
        at org.sqlite.jdbc3.JDBC3Connection.prepareStatement(JDBC3Connection.java:263)
        at org.sqlite.jdbc3.JDBC3Connection.prepareStatement(JDBC3Connection.java:235)
        at com.compomics.util.db.object.ObjectsDB.establishConnection(ObjectsDB.java:855)
        at com.compomics.util.db.object.ObjectsDB.<init>(ObjectsDB.java:113)
        at eu.isas.peptideshaker.utils.PsdbParent.loadPsdbFile(PsdbParent.java:227)
        at eu.isas.reporter.io.ProjectImporter.importPeptideShakerProject(ProjectImporter.java:119)
        at eu.isas.reporter.cli.ReporterCLI.call(ReporterCLI.java:407)
        at eu.isas.reporter.cli.ReporterCLI.main(ReporterCLI.java:845)
org.sqlite.SQLiteException: [SQLITE_NOTADB]  File opened that is not a database file (file is encrypted or is not a database)
        at org.sqlite.core.DB.newSQLException(DB.java:909)
        at org.sqlite.core.DB.newSQLException(DB.java:921)
        at org.sqlite.core.DB.throwex(DB.java:886)
        at org.sqlite.core.NativeDB.prepare_utf8(Native Method)
        at org.sqlite.core.NativeDB.prepare(NativeDB.java:127)
        at org.sqlite.core.DB.prepare(DB.java:227)
        at org.sqlite.core.CorePreparedStatement.<init>(CorePreparedStatement.java:41)
        at org.sqlite.jdbc3.JDBC3PreparedStatement.<init>(JDBC3PreparedStatement.java:30)
        at org.sqlite.jdbc4.JDBC4PreparedStatement.<init>(JDBC4PreparedStatement.java:19)
        at org.sqlite.jdbc4.JDBC4Connection.prepareStatement(JDBC4Connection.java:48)
        at org.sqlite.jdbc3.JDBC3Connection.prepareStatement(JDBC3Connection.java:263)
        at org.sqlite.jdbc3.JDBC3Connection.prepareStatement(JDBC3Connection.java:235)
        at com.compomics.util.db.object.ObjectsDB.loadFromDB(ObjectsDB.java:340)
        at com.compomics.util.db.object.ObjectsDB.retrieveObject(ObjectsDB.java:467)
        at eu.isas.peptideshaker.utils.PsdbParent.loadPsdbFile(PsdbParent.java:232)
        at eu.isas.reporter.io.ProjectImporter.importPeptideShakerProject(ProjectImporter.java:119)
        at eu.isas.reporter.cli.ReporterCLI.call(ReporterCLI.java:407)
        at eu.isas.reporter.cli.ReporterCLI.main(ReporterCLI.java:845)
java.lang.NullPointerException
        at eu.isas.peptideshaker.utils.PsdbParent.loadPsdbFile(PsdbParent.java:233)
        at eu.isas.reporter.io.ProjectImporter.importPeptideShakerProject(ProjectImporter.java:119)
        at eu.isas.reporter.cli.ReporterCLI.call(ReporterCLI.java:407)
        at eu.isas.reporter.cli.ReporterCLI.main(ReporterCLI.java:845)

I realize all that happens is that the zip gets extracted (so I can extract it myself), but zipping via Peptideshaker just to immediately unzip for Reporter doesn't seem desirable (zipping is slow) when the files are already present, in their original locations, and unzipped in the first place. If Reporter needs certain files beyond the psdb, it seems like it would be better to let us have the option to specify the location as there seems to be something amiss with detecting where the files are with the psdb only (i.e. it might point to a temp folder that was removed).

Well, either way, I can make my pipeline work as is so if it's just more of a philosophical difference about how to handle files, we can consider my case resolved.

Thanks!

hbarsnes commented 3 years ago

There was a typo in the Reporter code for recognizing the .zip file extension (or rather the "." was missing), hence the file was assumed to be a psdb file instead. I will release a new version later, as I just noticed that Reporter has a similar issue with the progress display that you noted for the FastaCLI.

With regards to the initial issue, you are correct in that the problem is that the data files are inside the searchgui.zip file and thus only available as long as this zip file is temporarily unzipped during the processing in PeptideShaker. After the PeptideShaker project has been saved, the searchgui.zip file will be closed and the zipped data folder will no longer accessible.

The solution may be a be counter-intuitive, but if you do not use the -output_data option when running SearchGUI the original file locations will be the ones used in PeptideShaker (as provided via the PeptideShakerCLI options -fasta_file and -spectrum_files). This means that PeptideShaker will inform Reporter of the original file locations and, if these can still be found, Reporter should no longer complain about missing data files.

Note that both PeptideShaker and Reporter looks for the data files in three locations: first the original location, then a folder called "data" next to the psdb file, and finally, in the same folder as the psdb file. Hence, if you put your data files in either of the two latter locations, this works as a quick fix for when the original files have been moved or can no longer be found.

Adding the -fasta_file and -spectrum_files options for ReporterCLI sounds like a good idea as well though. I'll see what I can do.

hbarsnes commented 3 years ago

I just released a new version of Reporter that supports using zipped PeptideShaker projects as input to the command line. (I have not yet looked at the other data folder issues.)