donovan-h-parks / RefineM

A toolbox for improving metagenome-assembled genomes.
GNU General Public License v3.0
63 stars 9 forks source link

outliers run error #26

Open gaohanlisa opened 6 years ago

gaohanlisa commented 6 years ago

Hi, I was trying to run the outlier function but found the following error, any suggestions for the solution? Best,

[2018-07-26 20:56:53] INFO: RefineM v0.0.23 [2018-07-26 20:56:53] INFO: refinem outliers --no_plots /projects/b1042/Wells/GaoHan/DNA-all/DNA-Total/refinem_stats/scaffold_stats.tsv /projects/b1052/Wells_b1042/GaoHan/CANDO_RNA/binning/metabat/refinem_outlier [2018-07-26 20:56:53] INFO: Reading scaffold statistics. [2018-07-26 20:57:13] INFO: Calculating statistics for 137 genomes over 290166 scaffolds. [2018-07-26 20:57:24] INFO: Reading reference distributions. Finding outliers in 3 of 137 (2.2%) genomes. Unexpected error: <type 'exceptions.FloatingPointError'> Traceback (most recent call last): File "/projects/b1052/pythonenvs/python2.7/refinem/bin/refinem", line 396, in parser.parse_options(args) File "/projects/b1052/pythonenvs/python2.7/refinem/lib/python2.7/site-packages/refinem/main.py", line 687, in parse_options self.outliers(options) File "/projects/b1052/pythonenvs/python2.7/refinem/lib/python2.7/site-packages/refinem/main.py", line 280, in outliers options.report_type, outlier_file) File "/projects/b1052/pythonenvs/python2.7/refinem/lib/python2.7/site-packages/refinem/outliers.py", line 484, in identify cov_perc) File "/projects/b1052/pythonenvs/python2.7/refinem/lib/python2.7/site-packages/refinem/outliers.py", line 390, in outlier_info corr_r, _corr_p = pearsonr(gs.median_coverage, stats.coverage) File "/projects/b1052/pythonenvs/python2.7/refinem/lib/python2.7/site-packages/scipy/stats/stats.py", line 3010, in pearsonr r = r_num / r_den FloatingPointError: invalid value encountered in double_scalars

rotoscan commented 5 years ago

Hello Parks,

This is an amazing tool, but for some of my bins I am facing the exact same problem.

While checking the std.out file, I get this: `$ cat /data/msb/joao/vojkte_data/LOGS/refinem_outlier-4878374.out [2018-12-05 13:58:52] INFO: RefineM v0.0.23

[2018-12-05 13:58:52] INFO: refinem outliers /work/leonorfe/vojkte_data/refinem_scaff_stats_results/bins_2000.21/scaffold_stats.tsv /work/leonorfe/vojkte_data/refinem_outliers_results/bins_2000.21 [2018-12-05 13:58:52] INFO: Reading scaffold statistics. [2018-12-05 14:22:26] INFO: Calculating statistics for 1 genomes over 17936557 scaffolds. [2018-12-05 14:22:34] INFO: Reading reference distributions. Finding outliers in 1 of 1 (100.0%) genomes. Unexpected error: <type 'exceptions.FloatingPointError'> Wed Dec 5 14:25:14 CET 2018 ` and the std.err file presents this:

Traceback (most recent call last): File "/gpfs1/data/msb/tools/checkm/checkm_env/bin/refinem", line 396, in <module> parser.parse_options(args) File "/gpfs1/data/msb/tools/checkm/checkm_env/lib/python2.7/site-packages/refinem/main.py", line 687, in parse_options self.outliers(options) File "/gpfs1/data/msb/tools/checkm/checkm_env/lib/python2.7/site-packages/refinem/main.py", line 280, in outliers options.report_type, outlier_file) File "/gpfs1/data/msb/tools/checkm/checkm_env/lib/python2.7/site-packages/refinem/outliers.py", line 484, in identify cov_perc) File "/gpfs1/data/msb/tools/checkm/checkm_env/lib/python2.7/site-packages/refinem/outliers.py", line 390, in outlier_info corr_r, _corr_p = pearsonr(gs.median_coverage, stats.coverage) File "/gpfs1/data/msb/tools/checkm/checkm_env/lib/python2.7/site-packages/scipy/stats/stats.py", line 3010, in pearsonr r = r_num / r_den FloatingPointError: invalid value encountered in double_scalars

This yields an empty outliers.tsv and no plots whatsoever. Any help would be really appreciated!!

donovan-h-parks commented 5 years ago

Hello. I believe this issue is resolved in v0.0.24.

housw commented 5 years ago

Hi Donovan,

I also got an error while running refinem outliers with the following command:

refinem outliers --image_type pdf --gc_perc 98 --td_perc 98 --cov_corr -2 --cov_perc 50 24_MG_binning_results/dastools/CSP_DASTool_bins_refinem/stats/scaffold_stats.tsv 24_MG_binning_results/dastools/CSP_DASTool_bins_refinem/outlier

It works fine if I disable the plotting step with the --no_plots option. And here is the error message:

[2019-01-28 18:49:37] INFO: RefineM v0.0.24
[2019-01-28 18:49:37] INFO: refinem outliers --image_type pdf --gc_perc 98 --td_perc 98 --cov_corr -2 --cov_perc 50 24_MG_binning_results/dastools/CSP_DASTool_bins_refinem/stats/scaffold_stats.tsv 24_MG_binning_results/dastools/CSP_DASTool_bins_refinem/outliers
[2019-01-28 18:49:37] INFO: Reading scaffold statistics.
[2019-01-28 18:50:06] INFO: Calculating statistics for 424 genomes over 272220 scaffolds.
[2019-01-28 18:50:14] INFO: Reading reference distributions.
  Finding outliers in 424 of 424 (100.0%) genomes.
[2019-01-28 18:50:51] INFO: Outlier information written to: 24_MG_binning_results/dastools/CSP_DASTool_bins_refinem/outliers/outliers.tsv
  Plotting scaffold distribution for 1 of 424 (0.2%) genomes.
Unexpected error: <type 'exceptions.TypeError'>
Traceback (most recent call last):
  File "/opt/conda/bin/refinem", line 396, in <module>
    parser.parse_options(args)
  File "/opt/conda/lib/python2.7/site-packages/refinem/main.py", line 684, in parse_options
    self.outliers(options)
  File "/opt/conda/lib/python2.7/site-packages/refinem/main.py", line 295, in outliers
    plot_dir)
  File "/opt/conda/lib/python2.7/site-packages/refinem/outliers.py", line 770, in plot
    combined_plots.save_html(os.path.join(output_dir, genome_id + '.combined.html'))
  File "/opt/conda/lib/python2.7/site-packages/refinem/plots/combined_plots.py", line 220, in save_html
    AbstractPlot.save_html(self, output_html, html_script, html_body)
  File "/opt/conda/lib/python2.7/site-packages/biolib/plots/abstract_plot.py", line 107, in save_html
    html_str = mpld3.fig_to_html(self.fig, template_type='simple')
  File "/opt/conda/lib/python2.7/site-packages/mpld3/_display.py", line 251, in fig_to_html
    figure_json=json.dumps(figure_json, cls=NumpyEncoder),
  File "/opt/conda/lib/python2.7/json/__init__.py", line 251, in dumps
    sort_keys=sort_keys, **kw).encode(obj)
  File "/opt/conda/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/opt/conda/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
  File "/opt/conda/lib/python2.7/site-packages/mpld3/_display.py", line 138, in default
    return json.JSONEncoder.default(self, obj)
  File "/opt/conda/lib/python2.7/json/encoder.py", line 184, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: array([0.5]) is not JSON serializable

Thanks a lot!

Best, Shengwei

donovan-h-parks commented 5 years ago

This problem is related to the mpld3 library RefineM uses to generate plots. Unfortunately, I do not believe this issue has been address yet so I think running the command without plotting is the best way forward at the moment.

chloelulu commented 5 years ago

Thank you @dparks1134 for creating such a great software ! Problem solved by your suggestion!