KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

"TypeError: boolean value of NA is ambiguous" came back #360

Closed imonteroo closed 5 months ago

imonteroo commented 5 months ago

Current Behavior

I installed autometa 2.2.2 in a mamba environment. I followed the bash step by step tutorial with a shot gun metagenomics sample I could ran right in previous versions. When I tried to run "autometa-binning" step, it produced an error in recursive_dbscan.py.

Steps to Reproduce

autometa-binning     --kmers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.bacteria.kmers.embedded.tsv     --coverages /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.coverages.tsv     --gc-content /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.gc.content.tsv     --markers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.markers.tsv     --clustering-method dbscan     --completeness 20     --purity 95     --cov-stddev-limit 25     --gc-stddev-limit 5     --taxonomy /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.taxonomy.tsv     --output-binning /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.binning.tsv     --output-main /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.main.tsv     --starting-rank superkingdom     --rank-filter superkingdom     --rank-name-filter bacteria
[06/10/2024 02:37:24 PM DEBUG] autometa.binning.utilities: Reading/merging 4 contig annotation files
[06/10/2024 02:37:24 PM DEBUG] autometa.binning.utilities: merged annotations shape: (13923, 15)
[06/10/2024 02:37:24 PM DEBUG] autometa.binning.utilities: superkingdom filtered to bacteria taxonomy. shape: (5959, 15)
[06/10/2024 02:37:24 PM INFO] root: Selected clustering method: dbscan
[06/10/2024 02:37:24 PM INFO] autometa.binning.recursive_dbscan: Using dbscan clustering method
[06/10/2024 02:37:24 PM DEBUG] autometa.binning.recursive_dbscan: Using ranks: superkingdom, phylum, class, order, family, genus, species
[06/10/2024 02:37:24 PM INFO] autometa.binning.recursive_dbscan: Examining superkingdom: 1 unique taxa (5,959 contigs)
[06/10/2024 02:37:24 PM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: superkingdom : bacteria : (5959, 15)
Traceback (most recent call last):
  File "/media/microviable/d/miniconda3/envs/autometa_env/bin/autometa-binning", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 882, in main
    main_out = taxon_guided_binning(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 660, in taxon_guided_binning
    clusters_df = get_clusters(
                  ^^^^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 495, in get_clusters
    clustered_df, unclustered_df = clusterer(
                                   ^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 190, in recursive_dbscan
    if median_completeness >= best_median:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "missing.pyx", line 419, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

Expected Behavior

This sample ran right in previous versions, did not remember the number.

Environment Information

autometa-config --print

```bash [06/10/2024 02:46:14 PM DEBUG] root: environment dependencies satisifed: True section option value common home_dir /media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages environ diamond /media/microviable/d/miniconda3/envs/autometa_env/bin/diamond environ hmmsearch /media/microviable/d/miniconda3/envs/autometa_env/bin/hmmsearch environ hmmpress /media/microviable/d/miniconda3/envs/autometa_env/bin/hmmpress environ hmmscan /media/microviable/d/miniconda3/envs/autometa_env/bin/hmmscan environ prodigal /media/microviable/d/miniconda3/envs/autometa_env/bin/prodigal environ bowtie2 /media/microviable/d/miniconda3/envs/autometa_env/bin/bowtie2 environ samtools /media/microviable/d/miniconda3/envs/autometa_env/bin/samtools environ bedtools /media/microviable/d/miniconda3/envs/autometa_env/bin/bedtools versions diamond 2.0.15 versions hmmsearch 3.3.2 versions hmmpress 3.3.2 versions hmmscan 3.3.2 versions prodigal 2.6.3 versions bowtie2 2.5.0 versions samtools 1.16.1 versions bedtools 2.30.0 databases base /media/microviable/e/autometa_databases databases ncbi /media/microviable/e/autometa_databases/ncbi databases gtdb /media/microviable/e/autometa_databases/gtdb databases markers /media/microviable/e/autometa_databases/markers database_urls taxdump ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz database_urls accession2taxid ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz database_urls nr ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz database_urls bacteria_single_copy https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.hmm database_urls bacteria_single_copy_cutoffs https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.cutoffs database_urls archaea_single_copy https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/archaea.single_copy.hmm database_urls archaea_single_copy_cutoffs https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/archaea.single_copy.cutoffs database_urls proteins_aa_reps https://data.gtdb.ecogenomic.org/releases/latest/genomic_files_reps/gtdb_proteins_aa_reps.tar.gz database_urls gtdb_taxdmp https://github.com/shenwei356/gtdb-taxdump/releases/latest/download/gtdb-taxdump.tar.gz checksums taxdump ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz.md5 checksums accession2taxid ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz.md5 checksums nr ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz.md5 checksums bacteria_single_copy https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.hmm.md5 checksums bacteria_single_copy_cutoffs https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/bacteria.single_copy.cutoffs.md5 checksums archaea_single_copy https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/archaea.single_copy.hmm.md5 checksums archaea_single_copy_cutoffs https://raw.githubusercontent.com/KwanLab/Autometa/main/autometa/databases/markers/archaea.single_copy.cutoffs.md5 ncbi host ftp.ncbi.nlm.nih.gov ncbi taxdump /media/microviable/e/autometa_databases/ncbi/taxdump.tar.gz ncbi nodes /media/microviable/e/autometa_databases/ncbi/nodes.dmp ncbi names /media/microviable/e/autometa_databases/ncbi/names.dmp ncbi merged /media/microviable/e/autometa_databases/ncbi/merged.dmp ncbi delnodes /media/microviable/e/autometa_databases/ncbi/delnodes.dmp ncbi accession2taxid /media/microviable/e/autometa_databases/ncbi/prot.accession2taxid.gz ncbi nr /media/microviable/e/autometa_databases/ncbi/nr.gz gtdb host data.gtdb.ecogenomic.org gtdb release latest gtdb proteins_aa_reps /media/microviable/e/autometa_databases/gtdb/gtdb_proteins_aa_reps.tar.gz gtdb gtdb_taxdmp /media/microviable/e/autometa_databases/gtdb/gtdb-taxdump.tar.gz markers host raw.githubusercontent.com markers bacteria_single_copy /media/microviable/e/autometa_databases/markers/bacteria.single_copy.hmm markers bacteria_single_copy_cutoffs /media/microviable/e/autometa_databases/markers/bacteria.single_copy.cutoffs markers archaea_single_copy /media/microviable/e/autometa_databases/markers/archaea.single_copy.hmm markers archaea_single_copy_cutoffs /media/microviable/e/autometa_databases/markers/archaea.single_copy.cutoffs files metagenome metagenome.fna files fwd_reads fwd_reads.fastq files rev_reads rev_reads.fastq files se_reads se_reads.fastq files sam alignments.sam files bam alignments.bam files lengths lengths.tsv files bed alignments.bed files length_filtered metagenome.filtered.fna files coverages coverages.tsv files kmer_counts kmers.tsv files kmer_normalized kmers.normalized.tsv files kmer_embedded kmers.embedded.tsv files nucleotide_orfs metagenome.filtered.orfs.fna files amino_acid_orfs metagenome.filtered.orfs.faa files blastp blastp.tsv files blastp_hits blastp.hits.pkl.gz files lca lca.tsv files blastx blastx.tsv files taxonomy taxonomy.tsv files bacteria_hmmscan bacteria.hmmscan.tsv files bacteria_markers bacteria.markers.tsv files archaea_hmmscan archaea.hmmscan.tsv files archaea_markers archaea.markers.tsv files bacteria_binning bacteria.binning.tsv files archaea_binning archaea.binning.tsv files checkpoints checkpoints.tsv ```

Run Information

 packages in environment at /media/microviable/d/miniconda3/envs/autometa_env:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
_sysroot_linux-64_curr_repodata_hack 3                   h69a702a_14    conda-forge
alsa-lib                  1.2.11               hd590300_1    conda-forge
attrs                     23.2.0             pyh71513ae_0    conda-forge
autometa                  2.2.2              pyh7cba7a3_0    bioconda
beautifulsoup4            4.12.3             pyha770c72_0    conda-forge
bedtools                  2.31.1               hf5e1c6e_1    bioconda
biom-format               2.1.16          py312h9a8786e_1    conda-forge
biopython                 1.83            py312h98912ed_0    conda-forge
blast                     2.15.0          pl5321h6f7f691_1    bioconda
boost-cpp                 1.78.0               h2c5509c_4    conda-forge
bottleneck                1.3.8           py312hc7c0aa3_0    conda-forge
bowtie2                   2.5.4                he20e202_0    bioconda
brotli-python             1.1.0           py312h30efb56_1    conda-forge
bwa                       0.7.18               he4a0461_0    bioconda
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.28.1               hd590300_0    conda-forge
ca-certificates           2024.6.2             hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cairo                     1.18.0               h3faef2a_0    conda-forge
certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
click                     8.1.7           unix_pyh707e725_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
curl                      8.8.0                he654da7_0    conda-forge
diamond                   2.1.9                h43eeafb_0    bioconda
entrez-direct             21.6                 he881be0_0    bioconda
exceptiongroup            1.2.0              pyhd8ed1ab_2    conda-forge
expat                     2.6.2                h59595ed_0    conda-forge
fastqc                    0.12.1               hdfd78af_0    bioconda
filelock                  3.14.0             pyhd8ed1ab_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 h77eed37_2    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
gdown                     5.2.0              pyhd8ed1ab_0    conda-forge
gettext                   0.22.5               h59595ed_2    conda-forge
gettext-tools             0.22.5               h59595ed_2    conda-forge
giflib                    5.2.2                hd590300_0    conda-forge
graphite2                 1.3.13            h59595ed_1003    conda-forge
h5py                      3.11.0          nompi_py312hb7ab980_101    conda-forge
harfbuzz                  8.5.0                hfac3d4d_0    conda-forge
hdf5                      1.14.3          nompi_hdf9ad27_104    conda-forge
hdmedians                 0.14.2          py312h085067d_6    conda-forge
hmmer                     3.4                  hdbdd923_1    bioconda
htslib                    1.20                 h81da01d_0    bioconda
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.7                pyhd8ed1ab_0    conda-forge
iniconfig                 2.0.0              pyhd8ed1ab_0    conda-forge
joblib                    1.4.2              pyhd8ed1ab_0    conda-forge
kart                      2.5.6                hcd5855d_4    bioconda
kernel-headers_linux-64   3.10.0              h4a8ded7_14    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
lcms2                     2.16                 hb7c19ff_0    conda-forge
ld_impl_linux-64          2.40                 hf3520f5_1    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libaec                    1.1.3                h59595ed_0    conda-forge
libasprintf               0.22.5               h661eb56_2    conda-forge
libasprintf-devel         0.22.5               h661eb56_2    conda-forge
libblas                   3.9.0           22_linux64_openblas    conda-forge
libcblas                  3.9.0           22_linux64_openblas    conda-forge
libcups                   2.3.3                h4637d8d_4    conda-forge
libcurl                   8.8.0                hca28451_0    conda-forge
libdeflate                1.20                 hd590300_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.6.2                h59595ed_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h77fa898_7    conda-forge
libgettextpo              0.22.5               h59595ed_2    conda-forge
libgettextpo-devel        0.22.5               h59595ed_2    conda-forge
libgfortran-ng            13.2.0               h69a702a_7    conda-forge
libgfortran5              13.2.0               hca663fb_7    conda-forge
libglib                   2.80.2               hf974151_0    conda-forge
libgomp                   13.2.0               h77fa898_7    conda-forge
libhwloc                  2.10.0          default_h5622ce7_1001    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libidn2                   2.3.7                hd590300_0    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0           22_linux64_openblas    conda-forge
libllvm14                 14.0.6               hcd5def8_4    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.27          pthreads_h413a1c8_0    conda-forge
libpng                    1.6.43               h2797004_0    conda-forge
libsqlite                 3.45.3               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               hc0a3c3a_7    conda-forge
libtiff                   4.6.0                h1dd3fc0_3    conda-forge
libunistring              0.9.10               h7f98852_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libwebp-base              1.4.0                hd590300_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.12.7               hc051c1a_0    conda-forge
libzlib                   1.2.13               h4ab18f5_6    conda-forge
llvm-openmp               8.0.1                hc9558a2_0    conda-forge
llvmlite                  0.42.0          py312hb06c811_1    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
megahit                   1.2.9                h43eeafb_5    bioconda
natsort                   8.4.0              pyhd8ed1ab_0    conda-forge
ncbi-vdb                  3.1.1                h4ac6f70_0    bioconda
ncurses                   6.5                  h59595ed_0    conda-forge
nomkl                     1.0                  h5ca1d4c_0    conda-forge
numba                     0.59.1          py312hacefee8_0    conda-forge
numexpr                   2.10.0          py312hf412c99_100    conda-forge
numpy                     1.26.4          py312heda63a1_0    conda-forge
openjdk                   22.0.1               hb622114_0    conda-forge
openmp                    8.0.1                         0    conda-forge
openssl                   3.3.1                h4ab18f5_0    conda-forge
packaging                 24.0               pyhd8ed1ab_0    conda-forge
pandas                    2.1.1           py312h526ad5a_0    anaconda
parallel                  20240522             ha770c72_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pcre2                     10.43                hcad00b1_0    conda-forge
perl                      5.32.1          7_hd590300_perl5    conda-forge
perl-archive-tar          2.40            pl5321hdfd78af_0    bioconda
perl-carp                 1.50            pl5321hd8ed1ab_0    conda-forge
perl-common-sense         3.75            pl5321hd8ed1ab_0    conda-forge
perl-compress-raw-bzip2   2.201           pl5321h166bdaf_0    conda-forge
perl-compress-raw-zlib    2.202           pl5321h166bdaf_0    conda-forge
perl-encode               3.21            pl5321hd590300_0    conda-forge
perl-exporter             5.74            pl5321hd8ed1ab_0    conda-forge
perl-exporter-tiny        1.002002        pl5321hd8ed1ab_0    conda-forge
perl-extutils-makemaker   7.70            pl5321hd8ed1ab_0    conda-forge
perl-io-compress          2.201           pl5321hdbdd923_2    bioconda
perl-io-zlib              1.14            pl5321hdfd78af_0    bioconda
perl-json                 4.10            pl5321hdfd78af_0    bioconda
perl-json-xs              2.34            pl5321h4ac6f70_6    bioconda
perl-list-moreutils       0.430           pl5321hdfd78af_0    bioconda
perl-list-moreutils-xs    0.430           pl5321h031d066_2    bioconda
perl-parent               0.241           pl5321hd8ed1ab_0    conda-forge
perl-pathtools            3.75            pl5321h166bdaf_0    conda-forge
perl-scalar-list-utils    1.63            pl5321h166bdaf_0    conda-forge
perl-storable             3.15            pl5321h166bdaf_0    conda-forge
perl-types-serialiser     1.01            pl5321hdfd78af_0    bioconda
pip                       24.0                     pypi_0    pypi
pixman                    0.43.2               h59595ed_0    conda-forge
pluggy                    1.5.0              pyhd8ed1ab_0    conda-forge
popt                      1.16              h0b475e3_2002    conda-forge
prodigal                  2.6.3                h031d066_8    bioconda
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pynndescent               0.5.12             pyhca7485f_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
pytest                    8.2.1              pyhd8ed1ab_0    conda-forge
python                    3.12.3          hab00c5b_0_cpython    conda-forge
python-annoy              1.17.3          py312h7070661_1    conda-forge
python-dateutil           2.9.0              pyhd8ed1ab_0    conda-forge
python-tzdata             2024.1             pyhd8ed1ab_0    conda-forge
python_abi                3.12                    4_cp312    conda-forge
pytz                      2024.1             pyhd8ed1ab_0    conda-forge
quast                     5.2.0                    pypi_0    pypi
readline                  8.2                  h8228510_1    conda-forge
requests                  2.32.3             pyhd8ed1ab_0    conda-forge
rsync                     3.3.0                he6cb5fe_0    conda-forge
samtools                  1.20                 h50ea8bc_0    bioconda
scikit-bio                0.6.0           py312hc7c0aa3_4    conda-forge
scikit-learn              1.5.0           py312h1fcc3ea_1    conda-forge
scipy                     1.13.1          py312hc2bc53b_0    conda-forge
seqkit                    2.8.2                h9ee0642_0    bioconda
setuptools                70.0.0             pyhd8ed1ab_0    conda-forge
simplejson                3.19.2                   pypi_0    pypi
six                       1.16.0             pyh6c4a22f_0    conda-forge
soupsieve                 2.5                pyhd8ed1ab_1    conda-forge
spades                    4.0.0                h5fb382e_1    bioconda
sysroot_linux-64          2.17                h4a8ded7_14    conda-forge
tbb                       2021.12.0            h297d8ca_1    conda-forge
threadpoolctl             3.5.0              pyhc1e730c_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
tqdm                      4.66.4             pyhd8ed1ab_0    conda-forge
trimap                    1.0.15             pyh5e36f6f_0    bioconda
trimmomatic               0.39                 hdfd78af_2    bioconda
tsne                      0.3.1           py312hf053be7_5    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
umap-learn                0.5.5           py312h7900ff3_1    conda-forge
urllib3                   2.2.1              pyhd8ed1ab_0    conda-forge
wget                      1.21.4               hda4d442_0    conda-forge
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
xorg-fixesproto           5.0               h7f98852_1002    conda-forge
xorg-inputproto           2.3.2             h7f98852_1002    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.9                h8ee46fc_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxfixes            5.0.3             h7f98852_1004    conda-forge
xorg-libxi                1.7.10               h7f98852_0    conda-forge
xorg-libxrender           0.9.11               hd590300_0    conda-forge
xorg-libxt                1.3.0                hd590300_1    conda-forge
xorg-libxtst              1.2.3             h7f98852_1002    conda-forge
xorg-recordproto          1.14.2            h7f98852_1002    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xxhash                    0.8.2                hd590300_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zlib                      1.2.13               h4ab18f5_6    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

chasemc commented 5 months ago

related https://github.com/KwanLab/Autometa/issues/349 Will close here and discuss there