SegataLab / panphlan

PanPhlAn is a strain-level metagenomic profiling tool for identifying the gene composition of individual strains in metagenomic samples
MIT License
41 stars 6 forks source link

NameError: name 'randint' is not defined during panphlan_profiling.py #5

Open fconstancias opened 4 years ago

fconstancias commented 4 years ago

Hey there,

I am encountering the following error when running the panphlan_profiling.py script:

...
STEP 2. Create coverage matrix
 [I] Reading mapping result file: MP3_TONGUE_DNA_Rothia_dentocariosa.csv.bz2
 [I] Reading mapping result file: K5_TONGUE_DNA_Rothia_dentocariosa.csv.bz2
 [I] Reading mapping result file: MP3_SUB_DNA_Rothia_dentocariosa.csv.bz2
 [I] Reading mapping result file: MP8_TONGUE_DNA_Rothia_dentocariosa.csv.bz2
 [I] Reading mapping result file: MP8_SUB_DNA_Rothia_dentocariosa.csv.bz2
 [I] Gene family normalization for DNA sample K5_TONGUE_DNA_Rothia_dentocariosa.csv.bz2...
 [I] Gene family normalization for DNA sample MP3_SUB_DNA_Rothia_dentocariosa.csv.bz2...
 [I] Gene family normalization for DNA sample MP3_TONGUE_DNA_Rothia_dentocariosa.csv.bz2...
 [I] Gene family normalization for DNA sample MP8_SUB_DNA_Rothia_dentocariosa.csv.bz2...
 [I] Gene family normalization for DNA sample MP8_TONGUE_DNA_Rothia_dentocariosa.csv.bz2...
Gene families coverage matrix has been printed in /datadrive05/Flo/Saliva/Saliva2/panphlan/prof/Rothia_dentocariosa/Rothia_dentocariosa_o_covmat.csv

STEP 3: Strain presence/absence filter based on coverage plateau curve...
 [I] Minimum median coverage threshold: 1.0
 [I] Left maximum plateau threshold: 1.7
 [I] Right minimum plateau threshold: 0.3
 [I] Maximum zero non-plateau threshold (multistrain detection): 0.2
 [I] K5_TONGUE_DNA_Rothia_dentocariosa.csv.bz2 median coverage: 0.68; left-side cov: 1.82; right-side cov: 0.58; out-plateau cov: 0.0
        K5_TONGUE_DNA_Rothia_dentocariosa.csv.bz2: no strain detected, sample below MIN COVERAGE threshold
 [I] MP3_SUB_DNA_Rothia_dentocariosa.csv.bz2 median coverage: 0.25; left-side cov: 1.87; right-side cov: 0.0; out-plateau cov: 0.0
        MP3_SUB_DNA_Rothia_dentocariosa.csv.bz2: no strain detected, sample below MIN COVERAGE threshold
 [I] MP3_TONGUE_DNA_Rothia_dentocariosa.csv.bz2 median coverage: 1.43; left-side cov: 1.43; right-side cov: 0.71; out-plateau cov: 0.0
         MP3_TONGUE_DNA_Rothia_dentocariosa.csv.bz2 OK - strain detected
 [I] MP8_SUB_DNA_Rothia_dentocariosa.csv.bz2 median coverage: 2.06; left-side cov: 1.32; right-side cov: 0.73; out-plateau cov: 0.0
         MP8_SUB_DNA_Rothia_dentocariosa.csv.bz2 OK - strain detected
 [I] MP8_TONGUE_DNA_Rothia_dentocariosa.csv.bz2 median coverage: 1.13; left-side cov: 1.6; right-side cov: 0.62; out-plateau cov: 0.0
         MP8_TONGUE_DNA_Rothia_dentocariosa.csv.bz2 OK - strain detected
Traceback (most recent call last):
  File "/datadrive05/Flo/tools/panphlan/panphlan_profiling.py", line 760, in <module>
    main()
  File "/datadrive05/Flo/tools/panphlan/panphlan_profiling.py", line 727, in main
    plot_dna_coverage(norm_samples_coverages, sample_stats, avg_genome_length, args, normalized = True)
  File "/datadrive05/Flo/tools/panphlan/panphlan_profiling.py", line 481, in plot_dna_coverage
    color, reset = random_color(used_colors)
  File "/datadrive05/Flo/tools/panphlan/misc.py", line 47, in random_color
    return (available[randint(0, len(available) - 1)], reset)
NameError: name 'randint' is not defined

I obtain the --o_covmat outputs, no --o_matrix nor --o_covplot_normed nor --o_idx and I am actually able to obtain the other outputs removing the -o_covplot_normed argument.

I am using the following command:

/datadrive05/Flo/tools/panphlan/panphlan_profiling.py --pangenome ${db_dir}/${taxa}/${taxa}_pangenome.tsv \
                -i ${map_dir}/${taxa}/ \
                --min_coverage 1 --left_max 1.70 --right_min 0.30 \
                --o_matrix ${out_dir_prof}/${taxa}/${taxa}_o_matrix.csv \
                --o_covmat ${out_dir_prof}/${taxa}/${taxa}_o_covmat.csv \
                --o_covplot_normed ${out_dir_prof}/${taxa}/${taxa}_o_covplot_normed \
                --o_idx ${out_dir_prof}/${taxa}/${taxa}_o_idx.csv \
                --add_ref \
                --verbose

I have created a conda env with the necessary packages and installed panphlan using git: git clone -b 3.0 https://github.com/SegataLab/panphlan.git

I had no issue with panphlan_download_pangenome.py & panphlan_map.py steps.

Thanks !

leonarDubois commented 4 years ago

Hi, thanks for highlighting this issue with extensive details. I pushed a correction, it should work now. Let me know if you encounter any other problem.

fconstancias commented 4 years ago

Thanks, it solved the issue but I got another one for the next step:

/home/constancias/miniconda3/envs/panphlan3/lib/python3.5/site-packages/sklearn/cluster/_optics.py:802: RuntimeWarning: divide by zero encountered in true_divide
  ratio = reachability_plot[:-1] / reachability_plot[1:]
MoTTY X11 proxy: Authorisation not recognised
MoTTY X11 proxy: Authorisation not recognised
Traceback (most recent call last):
  File "/datadrive05/Flo/tools/panphlan/panphlan_find_gene_grp.py", line 297, in <module>
    main()
  File "/datadrive05/Flo/tools/panphlan/panphlan_find_gene_grp.py", line 292, in main
    plot_heatmap(panphlan_matrix, optics_res, args.out_plot)
  File "/datadrive05/Flo/tools/panphlan/panphlan_find_gene_grp.py", line 156, in plot_heatmap
    plt.figure()
  File "/home/constancias/miniconda3/envs/panphlan3/lib/python3.5/site-packages/matplotlib/pyplot.py", line 525, in figure
    **kwargs)
  File "/home/constancias/miniconda3/envs/panphlan3/lib/python3.5/site-packages/matplotlib/backend_bases.py", line 3218, in new_figure_manager
    return cls.new_figure_manager_given_figure(num, fig)
  File "/home/constancias/miniconda3/envs/panphlan3/lib/python3.5/site-packages/matplotlib/backends/_backend_tk.py", line 1008, in new_figure_manager_given_figure
    window = Tk.Tk(className="matplotlib")
  File "/home/constancias/miniconda3/envs/panphlan3/lib/python3.5/tkinter/__init__.py", line 1876, in __init__
    self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: couldn't connect to display "localhost:12.0"

Any idea what is going wrong?

leonarDubois commented 4 years ago

Hello, thanks again for all the issues your raise, it helps us improving PanPhlAn. This might come from the version of the seaborn package that you are using. Which one is it ? Can you also give me the command line that leads to this message ? I'll check if there is something else that can cause the problem.

fconstancias commented 4 years ago

Thank you for developing the tool !

(panphlan3) constancias@scelse:/datadrive05/Flo/AM$ pip show seaborn
Name: seaborn
Version: 0.9.1
Summary: seaborn: statistical data visualization
Home-page: https://seaborn.pydata.org
Author: Michael Waskom
Author-email: mwaskom@nyu.edu
License: BSD (3-clause)
Location: /home/constancias/miniconda3/envs/panphlan3/lib/python3.5/site-packages
Requires: numpy, scipy, matplotlib, pandas
Required-by: 

This is the command I used :


# PanPlhAn fing gene groups
out_dir_prof="/datadrive05/Flo/Saliva/Saliva2/panphlan/test/prof"
out_dir_gn_groups="/datadrive05/Flo/Saliva/Saliva2/panphlan/test/gn_groups/"

mkdir -p ${out_dir_gn_groups}

for taxa in Streptococcus_parasanguinis Streptococcus_salivarius Prevotella_melaninogenica Prevotella_histicola Rothia_dentocariosa
do

/datadrive05/Flo/tools/panphlan/panphlan_find_gene_grp.py --i_matrix ${out_dir_prof}/${taxa}/${taxa}_o_matrix.csv \
                -o ${out_dir_gn_groups}${taxa}_gene_groups.tsv \
                -p ${out_dir_gn_groups}${taxa}_pangenome_file.tsv \
                --out_plot ${out_dir_gn_groups}${taxa}_heatmap.png

done

The output folder only contain the following:

ls -lh $out_dir_gn_groups
total 156K
-rw-rw-r-- 1 constancias constancias 14K Jun  3 22:58 Prevotella_histicola_gene_groups.tsv
-rw-rw-r-- 1 constancias constancias 23K Jun  3 22:58 Prevotella_melaninogenica_gene_groups.tsv
-rw-rw-r-- 1 constancias constancias 25K Jun  3 22:58 Rothia_dentocariosa_gene_groups.tsv
-rw-rw-r-- 1 constancias constancias 37K Jun  3 22:58 Streptococcus_parasanguinis_gene_groups.tsv
-rw-rw-r-- 1 constancias constancias 47K Jun  3 22:58 Streptococcus_salivarius_gene_groups.tsv
fconstancias commented 4 years ago

Dear @leonarDubois,

I am still encountering the follwoing issue, do you have any idea what is going on?

/home/constancias/miniconda3/envs/panphlan3/lib/python3.5/site-packages/sklearn/cluster/_optics.py:802: RuntimeWarning: divide by zero encountered in true_divide ratio = reachability_plot[:-1] / reachability_plot[1:]

MoTTY X11 proxy: Authorisation not recognised MoTTY X11 proxy: Authorisation not recognised Traceback (most recent call last): File "/datadrive05/Flo/tools/panphlan/panphlan_find_gene_grp.py", line 297, in main() File "/datadrive05/Flo/tools/panphlan/panphlan_find_gene_grp.py", line 292, in main plot_heatmap(panphlan_matrix, optics_res, args.out_plot) File "/datadrive05/Flo/tools/panphlan/panphlan_find_gene_grp.py", line 156, in plot_heatmap plt.figure() File "/home/constancias/miniconda3/envs/panphlan3/lib/python3.5/site-packages/matplotlib/pyplot.py", line 525, in figure **kwargs) File "/home/constancias/miniconda3/envs/panphlan3/lib/python3.5/site-packages/matplotlib/backend_bases.py", line 3218, in new_figure_manager return cls.new_figure_manager_given_figure(num, fig) File "/home/constancias/miniconda3/envs/panphlan3/lib/python3.5/site-packages/matplotlib/backends/_backend_tk.py", line 1008, in new_figure_manager_given_figure window = Tk.Tk(className="matplotlib") File "/home/constancias/miniconda3/envs/panphlan3/lib/python3.5/tkinter/init.py", line 1876, in init self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use) _tkinter.TclError: couldn't connect to display "localhost:12.0"

leonarDubois commented 4 years ago

Hi, I tried it in a conda environment with seaborn version 0.9 instead of 0.10, but it doesn't print any error. I assume it might come from matplotlib (I'm using version 3.2.1 ) or tkinter (version 8.6.10) . I'll look more into detail when I'll have the time. The truth is that I'm one of the only one working on PanPhlAn updates and development, and it's not my main project.

Sorry for the delay and thanks for your patience

leonarDubois commented 3 years ago

Hello,

sorry for coming back so late on this issue, I've been doing some modifications and fixes for preparing the official 3.0 release. Do you still have troubles plotting the HeatMap in a file ?