Nesvilab / MSFragger

Ultrafast, comprehensive peptide identification for mass spectrometry–based proteomics
https://msfragger.nesvilab.org
107 stars 7 forks source link

"ValueError: Specified a sep and a delimiter; you can only specify one." for the seccond #181

Closed fabianegli closed 3 years ago

fabianegli commented 3 years ago

After well over two days of computation I was met with the following error message:

*******************************TOTAL TIME 314.775 MIN********************************
Traceback (most recent call last):
  File "P:\Software\Proteomics\FragPipe-jre-16.0\fragpipe\tools\msfragger_pep_split.py", line 477, in <module>
    main()
  File "P:\Software\Proteomics\FragPipe-jre-16.0\fragpipe\tools\msfragger_pep_split.py", line 468, in main
    write_combined_scores_histo()
  File "P:\Software\Proteomics\FragPipe-jre-16.0\fragpipe\tools\msfragger_pep_split.py", line 142, in write_combined_scores_histo
    scores_histos = [sum(pd.read_csv(ee / (e.stem + '_scores_histogram.tsv'), dtype=np.uint64, delimiter='\t', header=None, sep='\t').values for ee in tempdir_parts)
  File "P:\Software\Proteomics\FragPipe-jre-16.0\fragpipe\tools\msfragger_pep_split.py", line 142, in <listcomp>
    scores_histos = [sum(pd.read_csv(ee / (e.stem + '_scores_histogram.tsv'), dtype=np.uint64, delimiter='\t', header=None, sep='\t').values for ee in tempdir_parts)
  File "P:\Software\Proteomics\FragPipe-jre-16.0\fragpipe\tools\msfragger_pep_split.py", line 142, in <genexpr>
    scores_histos = [sum(pd.read_csv(ee / (e.stem + '_scores_histogram.tsv'), dtype=np.uint64, delimiter='\t', header=None, sep='\t').values for ee in tempdir_parts)
  File "C:\Users\eglif\Anaconda3\envs\MSFragger\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\eglif\Anaconda3\envs\MSFragger\lib\site-packages\pandas\io\parsers\readers.py", line 571, in read_csv
    kwds_defaults = _refine_defaults_read(
  File "C:\Users\eglif\Anaconda3\envs\MSFragger\lib\site-packages\pandas\io\parsers\readers.py", line 1303, in _refine_defaults_read
    raise ValueError("Specified a sep and a delimiter; you can only specify one.")
ValueError: Specified a sep and a delimiter; you can only specify one.
DONE: slice 10 of 10
Process 'MSFragger' finished, exit code: 1
Process returned non-zero exit code, stopping

~~~~~~~~~~~~~~~~~~~~
Cancelling 767 remaining tasks

I am aware that the same was reported before in #167. However, the google docs link from @guoci apparently containing the solution doesn't work any more and it doesn't seem to have been solved on the software side in the current MSFragger version 3.3.

I was running the packaged "Nonspecific-peptidome" search with the addition of MS1 Quant.

MSFragger version: 3.3 Philosopher version: 4.0.0 (build 1626989421) Python version: Python 3.8.11 FragPipe version: v16.0

I think it would make reporting the system info easier if you made the following system info found in the Fragpipe window copy-pasteaable.

Screenshot 2021-09-30 at 06 39 27

fabianegli commented 3 years ago

After applying this patch, the splitting seemingly worked. However, I run into another issue when running the workflow with only one raw file and limiting the search to peptides of length 6 and 7.

INFO[08:35:45] Executing ProteinProphet  v4.0.0             
ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v6.0.0-rc15 Noctilucent, Build 202105101442-exported (Windows_NT-x86_64))
 (no FPKM) (using degen pep info)
Reading in C:\path\to\some.pep.xml...
did not find any PeptideProphet results in input data!  Did you forget to run PeptideProphet?
...read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, 0 7+ spectra with min prob 0.05

WARNING: no data - output file will be empty
FATA[08:35:46] Cannot execute program. there was an error with ProteinProphet, please check your parameters and input files 
Process 'ProteinProphet' finished, exit code: 1
Process returned non-zero exit code, stopping

~~~~~~~~~~~~~~~~~~~~
Cancelling 9 remaining tasks
fabianegli commented 3 years ago

After removing the MS1 quantification from the pipeline, the issue remains.

INFO[09:31:19] Executing PeptideProphet  v4.0.0             
Unknown file type. No file loaded.E:\path\to\measurement-1/measurement-1.raw
Unknown file type. No file loaded.E:\path\to\measurement-1/measurement-1.raw
 file 1: E:\path\to\measurement-1\measurement-1.pepXML
 processed altogether 4616 results
INFO: Results written to file: E:\path\to\measurement-1\interact-measurement-1.pep.xml

  - E:\path\to\measurement-1\interact-measurement-1.pep.xml
  - Building Commentz-Walter keyword tree...
  - Searching the tree...
  - Linking duplicate entries...
  - Printing results...

using Accurate Mass Bins
using PPM mass difference
Using Decoy Label "rev_".
Decoy Probabilities will be reported.
Not using ntt model
Using non-parametric distributions
 (X! Tandem) (using Tandem's expectation score for modeling)
adding ACCMASS mixture distribution
using search_offsets in ACCMASS mixture distr: 0
init with X! Tandem nonspecific 
 PeptideProphet  (TPP v5.2.1-dev Flammagenitus, Build 201906281613-exported (Windows_NT-x86_64)) AKeller@ISB
 read in 0 1+, 3971 2+, 419 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra.
Found 2034 Decoys, and 2356 Non-Decoys
MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN

INFO: Processing standard MixtureModel ... 
Initialising statistical models ...
Iterations: .........10.........20.........30WARNING: Mixture model quality test failed for charge (1+).
.
WARNING: Mixture model quality test failed for charge (2+).
WARNING: Mixture model quality test failed for charge (3+).
WARNING: Mixture model quality test failed for charge (4+).
WARNING: Mixture model quality test failed for charge (5+).
WARNING: Mixture model quality test failed for charge (6+).
WARNING: Mixture model quality test failed for charge (7+).
model complete after 32 iterations
INFO[09:31:44] Done                                         
Process 'PeptideProphet' finished, exit code: 0
PeptideProphet: Delete temp
P:\Software\Proteomics\FragPipe-jre-16.0\fragpipe\jre\bin\java.exe -cp P:\Software\Proteomics\FragPipe-jre-16.0\fragpipe\lib\fragpipe-16.0.jar com.github.chhh.utils.FileDelete E:\path\to\measurement-1\fragpipe-measurement-1.pepXML-temp
Process 'PeptideProphet: Delete temp' finished, exit code: 0
Rewrite pepxml [Work dir: E:\path\to\measurement-1]
P:\Software\Proteomics\FragPipe-jre-16.0\fragpipe\jre\bin\java.exe -cp P:\Software\Proteomics\FragPipe-jre-16.0\fragpipe\lib/* com.dmtavt.fragpipe.util.RewritePepxml E:\path\to\measurement-1\interact-measurement-1.pep.xml E:\raw\measurement-1.raw
Fixing pepxml: E:\path\to\measurement-1\interact-measurement-1.pep.xml
Writing output to: E:\path\to\measurement-1\interact-measurement-1.pep.xml13876277864279529126.temp-rewrite
Deleting file: E:\path\to\measurement-1\interact-measurement-1.pep.xml
Moving rewritten file to original location: [E:\path\to\measurement-1\interact-measurement-1.pep.xml13876277864279529126.temp-rewrite] -> [E:\path\to\measurement-1\interact-measurement-1.pep.xml]
Process 'Rewrite pepxml' finished, exit code: 0
ProteinProphet [Work dir: E:\path\to]
P:\Software\Proteomics\philosopher_v4.0.0_windows_amd64\philosopher.exe proteinprophet --maxppmdiff 2000000 --output combined E:\path\to\filelist_proteinprophet.txt
INFO[09:31:56] Executing ProteinProphet  v4.0.0             
ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v6.0.0-rc15 Noctilucent, Build 202105101442-exported (Windows_NT-x86_64))
 (no FPKM) (using degen pep info)
Reading in E:\path\to\measurement-1\interact-measurement-1.pep.xml...
did not find any PeptideProphet results in input data!  Did you forget to run PeptideProphet?
...read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, 0 7+ spectra with min prob 0.05

WARNING: no data - output file will be empty
FATA[09:31:57] Cannot execute program. there was an error with ProteinProphet, please check your parameters and input files 
Process 'ProteinProphet' finished, exit code: 1
Process returned non-zero exit code, stopping

~~~~~~~~~~~~~~~~~~~~
Cancelling 8 remaining tasks
anesvi commented 3 years ago

The problem is

read in 0 1+, 3971 2+, 419 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra.

Found 2034 Decoys, and 2356 Non-Decoys

Your search results failed validation. Please check your parameters, looks like you have very few high scoring IDs for PeptideProphet to get modeling done

From: Fabian Egli @.> Sent: Thursday, September 30, 2021 7:36 AM To: Nesvilab/MSFragger @.> Cc: Subscribed @.***> Subject: Re: [Nesvilab/MSFragger] "ValueError: Specified a sep and a delimiter; you can only specify one." for the seccond (#181)

External Email - Use Caution

After removing the MS1 quantification from the pipeline, the issue remains.

INFO[09:31:19] Executing PeptideProphet v4.0.0

Unknown file type. No file loaded.E:\path\to\measurement-1/measurement-1.raw

Unknown file type. No file loaded.E:\path\to\measurement-1/measurement-1.raw

file 1: E:\path\to\measurement-1\measurement-1.pepXML

processed altogether 4616 results

INFO: Results written to file: E:\path\to\measurement-1\interact-measurement-1.pep.xml

using Accurate Mass Bins

using PPM mass difference

Using Decoy Label "rev_".

Decoy Probabilities will be reported.

Not using ntt model

Using non-parametric distributions

(X! Tandem) (using Tandem's expectation score for modeling)

adding ACCMASS mixture distribution

using search_offsets in ACCMASS mixture distr: 0

init with X! Tandem nonspecific

PeptideProphet (TPP v5.2.1-dev Flammagenitus, Build 201906281613-exported (Windows_NT-x86_64)) @.***

read in 0 1+, 3971 2+, 419 3+, 0 4+, 0 5+, 0 6+, and 0 7+ spectra.

Found 2034 Decoys, and 2356 Non-Decoys

MS Instrument info: Manufacturer: UNKNOWN, Model: UNKNOWN, Ionization: UNKNOWN, Analyzer: UNKNOWN, Detector: UNKNOWN

INFO: Processing standard MixtureModel ...

Initialising statistical models ...

Iterations: .........10.........20.........30WARNING: Mixture model quality test failed for charge (1+).

..

WARNING: Mixture model quality test failed for charge (2+).

WARNING: Mixture model quality test failed for charge (3+).

WARNING: Mixture model quality test failed for charge (4+).

WARNING: Mixture model quality test failed for charge (5+).

WARNING: Mixture model quality test failed for charge (6+).

WARNING: Mixture model quality test failed for charge (7+).

model complete after 32 iterations

INFO[09:31:44] Done

Process 'PeptideProphet' finished, exit code: 0

PeptideProphet: Delete temp

P:\Software\Proteomics\FragPipe-jre-16.0\fragpipe\jre\bin\java.exe -cp P:\Software\Proteomics\FragPipe-jre-16.0\fragpipe\lib\fragpipe-16.0.jar com.github.chhh.utils.FileDelete E:\path\to\measurement-1\fragpipe-measurement-1.pepXML-temp

Process 'PeptideProphet: Delete temp' finished, exit code: 0

Rewrite pepxml [Work dir: E:\path\to\measurement-1]

P:\Software\Proteomics\FragPipe-jre-16.0\fragpipe\jre\bin\java.exe -cp P:\Software\Proteomics\FragPipe-jre-16.0\fragpipe\lib/* com.dmtavt.fragpipe.util.RewritePepxml E:\path\to\measurement-1\interact-measurement-1.pep.xml E:\raw\measurement-1.raw

Fixing pepxml: E:\path\to\measurement-1\interact-measurement-1.pep.xml

Writing output to: E:\path\to\measurement-1\interact-measurement-1.pep.xml13876277864279529126.temp-rewrite

Deleting file: E:\path\to\measurement-1\interact-measurement-1.pep.xml

Moving rewritten file to original location: [E:\path\to\measurement-1\interact-measurement-1.pep.xml13876277864279529126.temp-rewrite] -> [E:\path\to\measurement-1\interact-measurement-1.pep.xml]

Process 'Rewrite pepxml' finished, exit code: 0

ProteinProphet [Work dir: E:\path\to]

P:\Software\Proteomics\philosopher_v4.0.0_windows_amd64\philosopher.exe proteinprophet --maxppmdiff 2000000 --output combined E:\path\to\filelist_proteinprophet.txt

INFO[09:31:56] Executing ProteinProphet v4.0.0

ProteinProphet (C++) by Insilicos LLC and LabKey Software, after the original Perl by A. Keller (TPP v6.0.0-rc15 Noctilucent, Build 202105101442-exported (Windows_NT-x86_64))

(no FPKM) (using degen pep info)

Reading in E:\path\to\measurement-1\interact-measurement-1.pep.xml...

did not find any PeptideProphet results in input data! Did you forget to run PeptideProphet?

...read in 0 1+, 0 2+, 0 3+, 0 4+, 0 5+, 0 6+, 0 7+ spectra with min prob 0.05

WARNING: no data - output file will be empty

FATA[09:31:57] Cannot execute program. there was an error with ProteinProphet, please check your parameters and input files

Process 'ProteinProphet' finished, exit code: 1

Process returned non-zero exit code, stopping



Cancelling 8 remaining tasks

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<https://github.com/Nesvilab/MSFragger/issues/181#issuecomment-931240819>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIIMM62ZXWN4FTKGGQQMIODUERDQRANCNFSM5FBM6TPA>.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 
fabianegli commented 3 years ago

@anesvi Thank you for the reply. I will try with a larger range of peptide sequence lengths to increase the number of peptides. Can you estimate a number of peptides required for peptide prophet to work? Is it expected that a similar number of decoy and non-decoy peptides are found?

anesvi commented 3 years ago

Feel free to email me directly as well, if you want to discuss your experiment setup

From: Fabian Egli @.> Sent: Thursday, September 30, 2021 9:23 AM To: Nesvilab/MSFragger @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/MSFragger] "ValueError: Specified a sep and a delimiter; you can only specify one." for the seccond (#181)

External Email - Use Caution

@anesvihttps://github.com/anesvi Thank you for the reply. I will try with a larger range of peptide sequence lengths to increase the number of peptides. Can you estimate a number of peptides required for peptide prophet to work? Is it expected that a similar number of decoy and non-decoy peptides are found?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/MSFragger/issues/181#issuecomment-931318105, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM65HEP6T5IJ27BF46BTUERQBVANCNFSM5FBM6TPA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

anesvi commented 3 years ago

In a typical dataset, a lot more targets than decoys are found. So there is clearly ether something wrong with the data or the database or the search parameters… You should probably check that you can run the pipeline with a regular tryptic digest file first Then see what may be wrong with your specific search

From: Fabian Egli @.> Sent: Thursday, September 30, 2021 9:23 AM To: Nesvilab/MSFragger @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/MSFragger] "ValueError: Specified a sep and a delimiter; you can only specify one." for the seccond (#181)

External Email - Use Caution

@anesvihttps://github.com/anesvi Thank you for the reply. I will try with a larger range of peptide sequence lengths to increase the number of peptides. Can you estimate a number of peptides required for peptide prophet to work? Is it expected that a similar number of decoy and non-decoy peptides are found?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/MSFragger/issues/181#issuecomment-931318105, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM65HEP6T5IJ27BF46BTUERQBVANCNFSM5FBM6TPA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

fabianegli commented 3 years ago

@anesvi thank you, I will gladly take your offer if I can't get it to work with what we can discuss in this issue.

A test with peptides of length 20 instead of 6-7 worked. So I guess the shorter decoy peptides are just too probable and this causes PeptideProphet to fail. If that is the right explanation, I don't know.

In any case, after the Prophets worked, the MS1 Quantification did not. Is the MS1 Quantification not meant to work for the Nonspecific-peptidome workflow?

anesvi commented 3 years ago

Sorry, I cannot understand this “A test with peptides of length 20 instead of 6-7 worked.” PpetideProphet has a minimum length threshold of 7, by the way. So 6 or less are not even considered

MS1 should work with peptidome

Alexey

From: Fabian Egli @.> Sent: Thursday, September 30, 2021 1:42 PM To: Nesvilab/MSFragger @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/MSFragger] "ValueError: Specified a sep and a delimiter; you can only specify one." for the seccond (#181)

External Email - Use Caution

@anesvihttps://github.com/anesvi thank you, I will gladly take your offer if I can't get it to work with what we can discuss in this issue.

A test with peptides of length 20 instead of 6-7 worked. So I guess the shorter decoy peptides are just too probable and this causes PeptideProphet to fail. If that is the right explanation, I don't know.

In any case, after the Prophets worked, the MS1 Quantification did not. Is the MS1 Quantification not meant to work for the Nonspecific-peptidome workflow?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/MSFragger/issues/181#issuecomment-931530221, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM67JMA5TFMZZMNJKAOTUESOPFANCNFSM5FBM6TPA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

fcyu commented 3 years ago

In any case, after the Prophets worked, the MS1 Quantification did not. Is the MS1 Quantification not meant to work for the Nonspecific-peptidome workflow?

No, MS1 quantification (IonQuant) should be able to work for the nonspecific digestion. Can you send us the log file?

Best,

Fengchao

fabianegli commented 3 years ago

It has been some time now since my last run failed with the quantification - and now I know why: the match between run was active, but some files were apparently not similar enough when the MSFragger search was restricted to a single peptide length (e.g. 20). When the same parameters, except for a peptide length range (e.g. 17-20), the run completed.

Does it make not sense to complete the FragPipe run even in the case some runs no mach can be found?

I am also wondering if I can plainly combine the results from two almost identical Nonspecific Peptidome searches that differ in peptide length range, let's say one search for 7-15 and the other 16-35 amino acids. Or does that mess with the statistics?

fcyu commented 3 years ago

It has been some time now since my last run failed with the quantification - and now I know why: the match between run was active, but some files were apparently not similar enough when the MSFragger search was restricted to a single peptide length (e.g. 20). When the same parameters, except for a peptide length range (e.g. 17-20), the run completed.

Does it make not sense to complete the FragPipe run even in the case some runs no mach can be found?

IonQuant should be able to handle the case that some runs have no ID. Can you send us the log file?

I am also wondering if I can plainly combine the results from two almost identical Nonspecific Peptidome searches that differ in peptide length range, let's say one search for 7-15 and the other 16-35 amino acids. Or does that mess with the statistics?

It is not trivial to do such a combining because there are overlapping mass ranges between 7-15 and 16-35. Need to re-order and pick top-scored hit for each scan. The expectation value estimation would become an issue.

Best,

Fengchao

anesvi commented 3 years ago

About combining runs with different peptide length – nontrivial but interesting question. Perhaps you can contact us directly to discuss why you do it that way, and if there are other ways to achieve what you need. Alexey

From: Fengchao @.> Sent: Monday, October 18, 2021 11:45 PM To: Nesvilab/MSFragger @.> Cc: Nesvizhskii, Alexey @.>; Mention @.> Subject: Re: [Nesvilab/MSFragger] "ValueError: Specified a sep and a delimiter; you can only specify one." for the seccond (#181)

External Email - Use Caution

It has been some time now since my last run failed with the quantification - and now I know why: the match between run was active, but some files were apparently not similar enough when the MSFragger search was restricted to a single peptide length (e.g. 20). When the same parameters, except for a peptide length range (e.g. 17-20), the run completed.

Does it make not sense to complete the FragPipe run even in the case some runs no mach can be found?

IonQuant should be able to handle the case that some runs have no ID. Can you send us the log file?

I am also wondering if I can plainly combine the results from two almost identical Nonspecific Peptidome searches that differ in peptide length range, let's say one search for 7-15 and the other 16-35 amino acids. Or does that mess with the statistics?

It is not trivial to do such a combining because there are overlapping mass ranges between 7-15 and 16-35. Need to re-order and pick top-scored hit for each scan. The expectation value estimation would become an issue.

Best,

Fengchao

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Nesvilab/MSFragger/issues/181#issuecomment-946342512, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIIMM64ZH5K55C7PMMAVPO3UHTSUZANCNFSM5FBM6TPA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.


Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues