Closed sian231 closed 5 years ago
Hi Sian,
The first command actually combines three independent steps:
pyprophet peptide --in=merged.osw --context=run-specific
pyprophet peptide --in=merged.osw --context=experiment-wide
pyprophet peptide --in=merged.osw --context=global
On protein-level, the analogue would be:
pyprophet protein --in=merged.osw --context=run-specific
pyprophet protein --in=merged.osw --context=experiment-wide
pyprophet protein --in=merged.osw --context=global
Could you please execute these steps individually and let me know where it fails?
Thanks, George
Hi, so it seems there may be two issues:
pyprophet protein --in=merged.osw --context=run-specific
reproduces the error as above
pyprophet protein --in=merged.osw --context=experiment-wide
runs without an issue
pyprophet protein --in=merged.osw --context=global
produces the following:
c:\programdata\anaconda2\lib\site-packages\pyprophet\stats.py:371: RuntimeWarnin
g: invalid value encountered in true_divide
sens = tp / (num_total - num_null)
c:\programdata\anaconda2\lib\site-packages\pyprophet\stats.py:373: RuntimeWarnin
g: invalid value encountered in less
sens[sens < 0.0] = 0.0
c:\programdata\anaconda2\lib\site-packages\pyprophet\stats.py:374: RuntimeWarnin
g: invalid value encountered in greater
sens[sens > 1.0] = 1.0
qvalue pvalue svalue pep ... tn fp fn cutoff
0 0.00 1.0 NaN 1.936552e-07 ... 0.0 5910.0 0.0 -4.212924
1 0.01 NaN NaN NaN ... NaN NaN NaN NaN
2 0.02 NaN NaN NaN ... NaN NaN NaN NaN
3 0.05 NaN NaN NaN ... NaN NaN NaN NaN
4 0.10 NaN NaN NaN ... NaN NaN NaN NaN
5 0.20 NaN NaN NaN ... NaN NaN NaN NaN
6 0.30 NaN NaN NaN ... NaN NaN NaN NaN
7 0.40 NaN NaN NaN ... NaN NaN NaN NaN
8 0.50 NaN NaN NaN ... NaN NaN NaN NaN
[9 rows x 12 columns]
Traceback (most recent call last):
File "c:\programdata\anaconda2\lib\runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "c:\programdata\anaconda2\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\ProgramData\Anaconda2\Scripts\pyprophet.exe\__main__.py", line 9, in
<module>
File "c:\programdata\anaconda2\lib\site-packages\click\core.py", line 722, in
__call__
return self.main(*args, **kwargs)
File "c:\programdata\anaconda2\lib\site-packages\click\core.py", line 697, in
main
rv = self.invoke(ctx)
File "c:\programdata\anaconda2\lib\site-packages\click\core.py", line 1092, in
invoke
rv.append(sub_ctx.command.invoke(sub_ctx))
File "c:\programdata\anaconda2\lib\site-packages\click\core.py", line 895, in
invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\programdata\anaconda2\lib\site-packages\click\core.py", line 535, in
invoke
return callback(*args, **kwargs)
File "c:\programdata\anaconda2\lib\site-packages\pyprophet\main.py", line 172,
in protein
infer_proteins(infile, outfile, context, parametric, pfdr, pi0_lambda, pi0_m
ethod, pi0_smooth_df, pi0_smooth_log_pi0, lfdr_truncate, lfdr_monotone, lfdr_tra
nsformation, lfdr_adj, lfdr_eps)
File "c:\programdata\anaconda2\lib\site-packages\pyprophet\levels_contexts.py"
, line 105, in infer_proteins
data = statistics_report(data, outfile, context, "protein", parametric, pfdr
, pi0_lambda, pi0_method, pi0_smooth_df, pi0_smooth_log_pi0, lfdr_truncate, lfdr
_monotone, lfdr_transformation, lfdr_adj, lfdr_eps)
File "c:\programdata\anaconda2\lib\site-packages\pyprophet\levels_contexts.py"
, line 36, in statistics_report
save_report(outfile + "_" + context + "_" + analyte + ".pdf", outfile + ": "
+ context + " " + analyte + "-level error-rate control", data[data.decoy==1]["s
core"], data[data.decoy==0]["score"], stat_table["cutoff"], stat_table["svalue"]
, stat_table["qvalue"], data[data.decoy==0]["p_value"], pi0)
File "c:\programdata\anaconda2\lib\site-packages\pyprophet\report.py", line 50
, in save_report
ddensity = gaussian_kde(top_decoys)
File "c:\programdata\anaconda2\lib\site-packages\scipy\stats\kde.py", line 169
, in __init__
raise ValueError("`dataset` input should have multiple elements.")
ValueError: `dataset` input should have multiple elements.
Did you use a sample-specific library or a larger combined library? E.g. are most of your proteins detectable with at least one peptide? In that case it might be that protein-level error estimates are hard to make because there are only very few false detected proteins. One example would be using DIA-Umpire to generate a library and restrictively filtering it before using OpenSWATH.
Could you please post the PDF report in the experiment-wide context on protein level?
We used a combined library with decoys, so I can't imagine there being only a few false detected proteins.
I have posted the PDF report - it doesn't look good.
Does the peak group level (MS2-level) report look as expected?
I could be better - there is a lot of overlap between target and decoy, but this is large-scale data, so I wasn't hugely concerned when I saw it. I would say the MS1-level looks 'worse'. I've posted both. merged_ms1_report.pdf merged_ms2_report.pdf
The MS2-level report looks like expected from TripleTOF data. The MS1-level is not optimal, but has no effect on protein-level.
Which version of OpenMS did you use to generate the PQP files? It might be that there is an issue with the peptide to protein associations. Currently, it is still required that peptides are either proteotypic or map to predetermined protein groups (from protein inference algorithms).
I used the latest version (OpenMS 2.4). The library does contain non-proteotypic peptides, however they should satisfy the second criteria and map to protein groups. I will try using a library of only proteotypic peptides next. Thanks for your help!
I'm guessing this is similar.
Library is small, and I expect no overlap between targets and decoys -- these 2,000 PQPs were selected as the best scoring 2,000 from a more comprehensive library.
Command looks like (NB: this is a sample independent workflow, rather than a global context workflow):
#!/bin/bash
set -e
echo `pwd`
INPUT_FILES=""
INPUT_FILES+=" ProCan90-M06-07.osw"
for in_fname in ${INPUT_FILES}; do
echo "********************************************************************************"
echo "Working on $in_fname"
echo "********************************************************************************"
echo "Step 1: score model"
pyprophet score --in=$in_fname --level=ms1ms2 \
--pi0_lambda 0.0 0.0001 1e-05
echo "********************************************************************************"
echo "Step 2: calculate FDR controls"
pyprophet peptide \
--context=run-specific --in=$in_fname \
--pi0_lambda 0.0 0.0001 1e-05
pyprophet protein \
--context=run-specific --in=$in_fname \
--pi0_lambda 0.0 0.0001 1e-05
echo "********************************************************************************"
tsv_fname="${in_fname%.*}.pyprophet.tsv"
echo "Step 3: backpropagate results and export to TSV: $in_fname --> $tsv_fname"
pyprophet export --in=$in_fname --out=$tsv_fname --format=legacy_merged
done
Which throws an unexpected error:
********************************************************************************
Working on ProCan90-M06-07.osw
********************************************************************************
Step 1: score model
Info: Learn and apply classifier from input data.
Warning: Column var_mi_score contains only invalid/missing values. Column will be dropped.
Warning: Column var_mi_weighted_score contains only invalid/missing values. Column will be dropped.
Warning: Column var_mi_ratio_score contains only invalid/missing values. Column will be dropped.
Warning: Column var_elution_model_fit_score contains only invalid/missing values. Column will be dropped.
Warning: Column var_sonar_lag contains only invalid/missing values. Column will be dropped.
Warning: Column var_sonar_shape contains only invalid/missing values. Column will be dropped.
Warning: Column var_sonar_log_sn contains only invalid/missing values. Column will be dropped.
Warning: Column var_sonar_log_diff contains only invalid/missing values. Column will be dropped.
Warning: Column var_sonar_log_trend contains only invalid/missing values. Column will be dropped.
Warning: Column var_sonar_rsq contains only invalid/missing values. Column will be dropped.
Warning: Column var_ms1_mi_score contains only invalid/missing values. Column will be dropped.
Info: Data set contains 1990 decoy and 2000 target groups.
Info: Summary of input data:
Info: 40636 peak groups
Info: 3990 group ids
Info: 26 scores including main score
Info: Semi-supervised learning of weights:
Info: Start learning on 10 folds using 1 processes.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Info: Learning on cross-validation fold.
Info: Finished learning.
Info: Data set contains 1990 decoy and 2000 target groups.
Info: Mean qvalue = 2.412224e-03, std_dev qvalue = 4.371784e-04
Info: Mean svalue = 1.000000e+00, std_dev svalue = 0.000000e+00
Info: Finished scoring and estimation statistics.
Info: Processing input data finished.
Info: Time needed: 00:00:3.7
================================================================================
qvalue pvalue svalue ... fp fn cutoff
0 0.00 0.000503 1.0 ... 1.005025 -1996.994975 5.542891
1 0.01 0.002513 1.0 ... 5.025126 -1994.974874 3.839992
2 0.02 NaN NaN ... NaN NaN NaN
3 0.05 NaN NaN ... NaN NaN NaN
4 0.10 NaN NaN ... NaN NaN NaN
5 0.20 NaN NaN ... NaN NaN NaN
6 0.30 NaN NaN ... NaN NaN NaN
7 0.40 NaN NaN ... NaN NaN NaN
8 0.50 NaN NaN ... NaN NaN NaN
[9 rows x 12 columns]
================================================================================
Info: ProCan90-M06-07.osw written.
Info: ProCan90-M06-07_ms1ms2_report.pdf written.
Info: Total time: 3 seconds and 684 msecs wall time
********************************************************************************
Step 2: calculate FDR controls
================================================================================
qvalue pvalue svalue ... fp fn cutoff
0 0.00 0.000503 1.0 ... 1.005025 -1996.994975 5.542891
1 0.01 0.002513 1.0 ... 5.025126 -1994.974874 3.839992
2 0.02 NaN NaN ... NaN NaN NaN
3 0.05 NaN NaN ... NaN NaN NaN
4 0.10 NaN NaN ... NaN NaN NaN
5 0.20 NaN NaN ... NaN NaN NaN
6 0.30 NaN NaN ... NaN NaN NaN
7 0.40 NaN NaN ... NaN NaN NaN
8 0.50 NaN NaN ... NaN NaN NaN
[9 rows x 12 columns]
================================================================================
================================================================================
qvalue pvalue svalue ... fp fn cutoff
0 0.00 0.000503 1.0 ... 1.005025 -1996.994975 5.542891
1 0.01 0.002513 1.0 ... 5.025126 -1994.974874 3.839992
2 0.02 NaN NaN ... NaN NaN NaN
3 0.05 NaN NaN ... NaN NaN NaN
4 0.10 NaN NaN ... NaN NaN NaN
5 0.20 NaN NaN ... NaN NaN NaN
6 0.30 NaN NaN ... NaN NaN NaN
7 0.40 NaN NaN ... NaN NaN NaN
8 0.50 NaN NaN ... NaN NaN NaN
[9 rows x 12 columns]
================================================================================
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 918, in apply
result = self._python_apply_general(f)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 936, in _python_apply_general
self.axis)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 2273, in apply
res = f(group)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 908, in f
return func(g, *args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/pyprophet/levels_contexts.py", line 16, in statistics_report
error_stat, pi0 = error_statistics(data[data.decoy==0]['score'], data[data.decoy==1]['score'], parametric, pfdr, pi0_lambda, pi0_method, pi0_smooth_df, pi0_smooth_log_pi0, True, lfdr_truncate, lfdr_monotone, lfdr_transformation, lfdr_adj, lfdr_eps)
File "/opt/conda/lib/python3.6/site-packages/pyprophet/stats.py", line 468, in error_statistics
error_stat['pep'] = lfdr(target_pvalues, pi0['pi0'], lfdr_trunc, lfdr_monotone, lfdr_transf, lfdr_adj, lfdr_eps)
File "/opt/conda/lib/python3.6/site-packages/pyprophet/stats.py", line 309, in lfdr
splinefit = sp.interpolate.make_interp_spline(myd.support, myd.density)
File "/opt/conda/lib/python3.6/site-packages/scipy/interpolate/_bsplines.py", line 777, in make_interp_spline
raise ValueError("Expect x to be a 1-D sorted array_like.")
ValueError: Expect x to be a 1-D sorted array_like.
ms1ms2 report:
PyProphet installed using:
pip install git+https://github.com/PyProphet/pyprophet.git@086bf5de4abc10f8592665ac9b8d72db14ffc7b4
@bretttully So the error message is also on protein-level, right? Could you please share the OSW
file so I can reproduce the error? Most likely this will require better exception management for such extremely separated distributions.
@bretttully Could you please review the PR above? I will then merge with master and make a new release with the two patches.
Done. Looks good.
Hi, I am having some problems when attempting to run pyprophet protein. I have progressed through the OpenSwath Worfklow; merging and scoring (at the ms2 level).
I have ran pyprophet peptide (seemingly without an issue) with:
However when I try the same with pyrophet protein:
I get the following error:
Any ideas what could be causing this? Thanks in advance, Sian