KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

NA ambiguous in recursive_dbscan #349

Closed hyphaltip closed 3 weeks ago

hyphaltip commented 8 months ago

https://github.com/KwanLab/Autometa/blob/5e3250cfaa1fb9ec0e6361be1ab6aadc619f73a0/autometa/binning/recursive_dbscan.py#L190

I am getting this 'NA' error -

... site-packages/autometa/binning/recursive_dbscan.py", line 190, in recursive_dbscan
    if median_completeness >= best_median:
  File "missing.pyx", line 419, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

if I protect it. I think this will work but still testing, I assume getting a NA value means just skip it anyways??

if pd.isna(median_completeness):
    median_completeness = 0
samche42 commented 5 months ago

I'm getting the same error in several metagenomes - 4 out of 5 metagenomes failed with this message. The fifth one had no markers and so was similarly killed at the binning step. The work-around suggested seems reasonable to me, but I'm still curious how a completeness of NA pops up in the first place?

chasemc commented 4 months ago

image

I spent a lot of time today debugging the PRs: main currently passes all tests when run inside the current autometa Docker image (which is why the tests don't need to be changed- there is a regression) your modifications to fix biopython were good, but the change you made to dbscan also needs to be done for hdbscan (I might have that the other errors... FAILED :cold_sweat: tests/unit_tests/test_recursive_dbscan.py::test_taxon_guided_binning - TypeError: boolean value of NA is ambiguous FAILED :cold_sweat: tests/unit_tests/test_recursive_dbscan.py::test_get_clusters[dbscan] - TypeError: boolean value of NA is ambiguous FAILED :cold_sweat: tests/unit_tests/test_recursive_dbscan.py::test_get_clusters[hdbscan] - TypeError: boolean value of NA is ambiguous FAILED :cold_sweat: tests/unit_tests/test_recursive_dbscan.py::test_recursive_dbscan_main - TypeError: boolean value of NA is ambiguou ... are due to the upgrade of pandas from 1.5 to 2.1 whatever it is now ie if you pin pandas to 1.5 all the tests pass, if you upgrade to 2.1 (possibly any version between 1.5 and current) then those tests fail I'm not familiar enough with all of pandas' breaking changes to be able to point to the specific function that is leading to this It seems like you were trying to bypass the error in PR#356: https://github.com/KwanLab/Autometa/blob/3ae76dc2a02fa81be2f4d01fb2e955603a62e5c9/autometa/binning/recursive_dbscan.py#L190-L191 It seems like the "failed to recover clusters" error only occurs after this modification so I think it might be masking the real issue (ie the NAs are a clue that something changed upstream?) It's probably going to take comparing the intermediate results/DFs when using both pandas 1.5 and 2.1

I rebased dev onto main and created a new branch that has the biopython changes and pandas pinned to 1.5, feel free to work off that branch

Sort of related: there is a tests/environment.yml that the unit test runs on (if using the Makefile). IMO I think this needs to go away and it should only pull from the main ./autometa-env.ymlfile and then pip install pytest things within the make command

chasemc commented 4 months ago

related: https://github.com/KwanLab/Autometa/issues/350

imonteroo commented 1 month ago

I am working with autometa 2.2.2 and have the same error

autometa-binning \ --kmers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.bacteria.kmers.embedded.tsv \ --coverages /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.coverages.tsv \ --gc-content /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.gc.content.tsv \ --markers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.markers.tsv \ --clustering-method dbscan \ --completeness 20 \ --purity 95 \ --cov-stddev-limit 25 \ --gc-stddev-limit 5 \ --taxonomy /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.taxonomy.tsv \ --output-binning /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.binning.tsv \ --output-main /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.main.tsv \ --starting-rank superkingdom \ --rank-filter superkingdom \ --rank-name-filter bacteria [06/06/2024 09:08:51 AM DEBUG] autometa.binning.utilities: Reading/merging 4 contig annotation files [06/06/2024 09:08:51 AM DEBUG] autometa.binning.utilities: merged annotations shape: (13923, 15) [06/06/2024 09:08:51 AM DEBUG] autometa.binning.utilities: superkingdom filtered to bacteria taxonomy. shape: (5959, 15) [06/06/2024 09:08:51 AM INFO] root: Selected clustering method: dbscan [06/06/2024 09:08:51 AM INFO] autometa.binning.recursive_dbscan: Using dbscan clustering method [06/06/2024 09:08:51 AM DEBUG] autometa.binning.recursive_dbscan: Using ranks: superkingdom, phylum, class, order, family, genus, species [06/06/2024 09:08:51 AM INFO] autometa.binning.recursive_dbscan: Examining superkingdom: 1 unique taxa (5,959 contigs) [06/06/2024 09:08:51 AM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: superkingdom : bacteria : (5959, 15) Traceback (most recent call last): File "/media/microviable/d/miniconda3/envs/autometa_env/bin/autometa-binning", line 10, in sys.exit(main()) ^^^^^^ File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 882, in main main_out = taxon_guided_binning( ^^^^^^^^^^^^^^^^^^^^^ File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 660, in taxon_guided_binning clusters_df = get_clusters( ^^^^^^^^^^^^^ File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 495, in get_clusters clustered_df, unclustered_df = clusterer( ^^^^^^^^^^ File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 190, in recursive_dbscan if median_completeness >= best_median: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "missing.pyx", line 392, in pandas._libs.missing.NAType.bool TypeError: boolean value of NA is ambiguous`

Conda list gives this

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge _sysroot_linux-64_curr_repodata_hack 3 h69a702a_14 conda-forge alsa-lib 1.2.11 hd590300_1 conda-forge attrs 23.2.0 pyh71513ae_0 conda-forge autometa 2.2.2 pyh7cba7a3_0 bioconda beautifulsoup4 4.12.3 pyha770c72_0 conda-forge bedtools 2.31.1 hf5e1c6e_1 bioconda biom-format 2.1.16 py312h9a8786e_1 conda-forge biopython 1.83 py312h98912ed_0 conda-forge blast 2.15.0 pl5321h6f7f691_1 bioconda boost-cpp 1.78.0 h2c5509c_4 conda-forge bowtie2 2.5.4 he20e202_0 bioconda brotli-python 1.1.0 py312h30efb56_1 conda-forge bwa 0.7.18 he4a0461_0 bioconda bzip2 1.0.8 hd590300_5 conda-forge c-ares 1.28.1 hd590300_0 conda-forge ca-certificates 2024.6.2 hbcca054_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cairo 1.18.0 h3faef2a_0 conda-forge certifi 2024.2.2 pyhd8ed1ab_0 conda-forge charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge click 8.1.7 unix_pyh707e725_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge curl 8.8.0 he654da7_0 conda-forge diamond 2.1.9 h43eeafb_0 bioconda entrez-direct 21.6 he881be0_0 bioconda exceptiongroup 1.2.0 pyhd8ed1ab_2 conda-forge expat 2.6.2 h59595ed_0 conda-forge fastqc 0.12.1 hdfd78af_0 bioconda filelock 3.14.0 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 h77eed37_2 conda-forge fontconfig 2.14.2 h14ed4e7_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge freetype 2.12.1 h267a509_2 conda-forge gdown 5.2.0 pyhd8ed1ab_0 conda-forge gettext 0.22.5 h59595ed_2 conda-forge gettext-tools 0.22.5 h59595ed_2 conda-forge giflib 5.2.2 hd590300_0 conda-forge graphite2 1.3.13 h59595ed_1003 conda-forge h5py 3.11.0 nompi_py312hb7ab980_101 conda-forge harfbuzz 8.5.0 hfac3d4d_0 conda-forge hdf5 1.14.3 nompi_hdf9ad27_104 conda-forge hdmedians 0.14.2 py312h085067d_6 conda-forge hmmer 3.4 hdbdd923_1 bioconda htslib 1.20 h81da01d_0 bioconda icu 73.2 h59595ed_0 conda-forge idna 3.7 pyhd8ed1ab_0 conda-forge iniconfig 2.0.0 pyhd8ed1ab_0 conda-forge joblib 1.4.2 pyhd8ed1ab_0 conda-forge kart 2.5.6 hcd5855d_4 bioconda kernel-headers_linux-64 3.10.0 h4a8ded7_14 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge krb5 1.21.2 h659d440_0 conda-forge lcms2 2.16 hb7c19ff_0 conda-forge ld_impl_linux-64 2.40 hf3520f5_1 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libaec 1.1.3 h59595ed_0 conda-forge libasprintf 0.22.5 h661eb56_2 conda-forge libasprintf-devel 0.22.5 h661eb56_2 conda-forge libblas 3.9.0 22_linux64_openblas conda-forge libcblas 3.9.0 22_linux64_openblas conda-forge libcups 2.3.3 h4637d8d_4 conda-forge libcurl 8.8.0 hca28451_0 conda-forge libdeflate 1.20 hd590300_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 hd590300_2 conda-forge libexpat 2.6.2 h59595ed_0 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 13.2.0 h77fa898_7 conda-forge libgettextpo 0.22.5 h59595ed_2 conda-forge libgettextpo-devel 0.22.5 h59595ed_2 conda-forge libgfortran-ng 13.2.0 h69a702a_7 conda-forge libgfortran5 13.2.0 hca663fb_7 conda-forge libglib 2.80.2 hf974151_0 conda-forge libgomp 13.2.0 h77fa898_7 conda-forge libhwloc 2.10.0 default_h5622ce7_1001 conda-forge libiconv 1.17 hd590300_2 conda-forge libidn2 2.3.7 hd590300_0 conda-forge libjpeg-turbo 3.0.0 hd590300_1 conda-forge liblapack 3.9.0 22_linux64_openblas conda-forge libllvm14 14.0.6 hcd5def8_4 conda-forge libnghttp2 1.58.0 h47da74e_1 conda-forge libnsl 2.0.1 hd590300_0 conda-forge libopenblas 0.3.27 pthreads_h413a1c8_0 conda-forge libpng 1.6.43 h2797004_0 conda-forge libsqlite 3.45.3 h2797004_0 conda-forge libssh2 1.11.0 h0841786_0 conda-forge libstdcxx-ng 13.2.0 hc0a3c3a_7 conda-forge libtiff 4.6.0 h1dd3fc0_3 conda-forge libunistring 0.9.10 h7f98852_0 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libwebp-base 1.4.0 hd590300_0 conda-forge libxcb 1.15 h0b41bf4_0 conda-forge libxcrypt 4.4.36 hd590300_1 conda-forge libxml2 2.12.7 hc051c1a_0 conda-forge libzlib 1.2.13 h4ab18f5_6 conda-forge llvm-openmp 8.0.1 hc9558a2_0 conda-forge llvmlite 0.42.0 py312hb06c811_1 conda-forge lz4-c 1.9.4 hcb278e6_0 conda-forge megahit 1.2.9 h43eeafb_5 bioconda natsort 8.4.0 pyhd8ed1ab_0 conda-forge ncbi-vdb 3.1.1 h4ac6f70_0 bioconda ncurses 6.5 h59595ed_0 conda-forge numba 0.59.1 py312hacefee8_0 conda-forge numpy 1.26.4 py312heda63a1_0 conda-forge openjdk 22.0.1 hb622114_0 conda-forge openmp 8.0.1 0 conda-forge openssl 3.3.1 h4ab18f5_0 conda-forge packaging 24.0 pyhd8ed1ab_0 conda-forge pandas 2.2.2 py312h1d6d2e6_1 conda-forge parallel 20240522 ha770c72_0 conda-forge pcre 8.45 h9c3ff4c_0 conda-forge pcre2 10.43 hcad00b1_0 conda-forge perl 5.32.1 7_hd590300_perl5 conda-forge perl-archive-tar 2.40 pl5321hdfd78af_0 bioconda perl-carp 1.50 pl5321hd8ed1ab_0 conda-forge perl-common-sense 3.75 pl5321hd8ed1ab_0 conda-forge perl-compress-raw-bzip2 2.201 pl5321h166bdaf_0 conda-forge perl-compress-raw-zlib 2.202 pl5321h166bdaf_0 conda-forge perl-encode 3.21 pl5321hd590300_0 conda-forge perl-exporter 5.74 pl5321hd8ed1ab_0 conda-forge perl-exporter-tiny 1.002002 pl5321hd8ed1ab_0 conda-forge perl-extutils-makemaker 7.70 pl5321hd8ed1ab_0 conda-forge perl-io-compress 2.201 pl5321hdbdd923_2 bioconda perl-io-zlib 1.14 pl5321hdfd78af_0 bioconda perl-json 4.10 pl5321hdfd78af_0 bioconda perl-json-xs 2.34 pl5321h4ac6f70_6 bioconda perl-list-moreutils 0.430 pl5321hdfd78af_0 bioconda perl-list-moreutils-xs 0.430 pl5321h031d066_2 bioconda perl-parent 0.241 pl5321hd8ed1ab_0 conda-forge perl-pathtools 3.75 pl5321h166bdaf_0 conda-forge perl-scalar-list-utils 1.63 pl5321h166bdaf_0 conda-forge perl-storable 3.15 pl5321h166bdaf_0 conda-forge perl-types-serialiser 1.01 pl5321hdfd78af_0 bioconda pip 24.0 pypi_0 pypi pixman 0.43.2 h59595ed_0 conda-forge pluggy 1.5.0 pyhd8ed1ab_0 conda-forge popt 1.16 h0b475e3_2002 conda-forge prodigal 2.6.3 h031d066_8 bioconda pthread-stubs 0.4 h36c2ea0_1001 conda-forge pynndescent 0.5.12 pyhca7485f_0 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge pytest 8.2.1 pyhd8ed1ab_0 conda-forge python 3.12.3 hab00c5b_0_cpython conda-forge python-annoy 1.17.3 py312h7070661_1 conda-forge python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge python-tzdata 2024.1 pyhd8ed1ab_0 conda-forge python_abi 3.12 4_cp312 conda-forge pytz 2024.1 pyhd8ed1ab_0 conda-forge quast 5.2.0 pypi_0 pypi readline 8.2 h8228510_1 conda-forge requests 2.32.3 pyhd8ed1ab_0 conda-forge rsync 3.3.0 he6cb5fe_0 conda-forge samtools 1.20 h50ea8bc_0 bioconda scikit-bio 0.6.0 py312hc7c0aa3_4 conda-forge scikit-learn 1.5.0 py312h1fcc3ea_1 conda-forge scipy 1.13.1 py312hc2bc53b_0 conda-forge seqkit 2.8.2 h9ee0642_0 bioconda setuptools 70.0.0 pyhd8ed1ab_0 conda-forge simplejson 3.19.2 pypi_0 pypi six 1.16.0 pyh6c4a22f_0 conda-forge soupsieve 2.5 pyhd8ed1ab_1 conda-forge spades 4.0.0 h5fb382e_1 bioconda sysroot_linux-64 2.17 h4a8ded7_14 conda-forge tbb 2021.12.0 h297d8ca_1 conda-forge threadpoolctl 3.5.0 pyhc1e730c_0 conda-forge tk 8.6.13 noxft_h4845f30_101 conda-forge tomli 2.0.1 pyhd8ed1ab_0 conda-forge tqdm 4.66.4 pyhd8ed1ab_0 conda-forge trimap 1.0.15 pyh5e36f6f_0 bioconda trimmomatic 0.39 hdfd78af_2 bioconda tsne 0.3.1 py312hf053be7_5 conda-forge tzdata 2024a h0c530f3_0 conda-forge umap-learn 0.5.5 py312h7900ff3_1 conda-forge urllib3 2.2.1 pyhd8ed1ab_0 conda-forge wget 1.21.4 hda4d442_0 conda-forge wheel 0.43.0 pyhd8ed1ab_1 conda-forge xorg-fixesproto 5.0 h7f98852_1002 conda-forge xorg-inputproto 2.3.2 h7f98852_1002 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.1.1 hd590300_0 conda-forge xorg-libsm 1.2.4 h7391055_0 conda-forge xorg-libx11 1.8.9 h8ee46fc_0 conda-forge xorg-libxau 1.0.11 hd590300_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h0b41bf4_2 conda-forge xorg-libxfixes 5.0.3 h7f98852_1004 conda-forge xorg-libxi 1.7.10 h7f98852_0 conda-forge xorg-libxrender 0.9.11 hd590300_0 conda-forge xorg-libxt 1.3.0 hd590300_1 conda-forge xorg-libxtst 1.2.3 h7f98852_1002 conda-forge xorg-recordproto 1.14.2 h7f98852_1002 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xxhash 0.8.2 hd590300_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge zlib 1.2.13 h4ab18f5_6 conda-forge zstd 1.5.6 ha6fb4c9_0 conda-forge

chasemc commented 4 weeks ago

There's some general issues throughout Autometa (I don't know how pervasive) where recent changes to Pandas could cause issues.

The issue mentioned here appears to be when a recursive dbscan iteration comes up with no clusters. A fix in is in progress and a no-promises fix can be installed in the interim via pip: pip install git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na

For devs:

Part of the issue is Pandas changed how NAs are handled, and this project isn't the only that's had issues, https://pandas.pydata.org/docs/user_guide/missing_data.html#na-in-a-boolean-context

I found at least one case where div by 0 coerces np.nan and these are then mixed in a dataframe with pd.NA which may cause issues. The whole code base may need to be checked

CC @jason-c-kwan @Sidduppal @shaneroesemann

chasemc commented 4 weeks ago

@imonteroo, just wanted to reach out because you seem to be in active use. No promises but you can try the interim install in the comment above

imonteroo commented 4 weeks ago

@chasemc Thank you for your advise. You are rigth when you say tha I am in active use of autometa and I do not use any cluster. That could be the problem.

Unfortunately the error keeps after install hotfix-pandas-na. Well, a bit different

autometa-binning     --kmers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.bacteria.kmers.embedded.tsv     --coverages /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.coverages.tsv     --gc-content /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.gc.content.tsv     --markers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.markers.tsv     --clustering-method dbscan     --completeness 20     --purity 95     --cov-stddev-limit 25     --gc-stddev-limit 5     --taxonomy /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.taxonomy.tsv     --output-binning /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.binning.tsv     --output-main /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.main.tsv     --starting-rank superkingdom     --rank-filter superkingdom     --rank-name-filter bacteria
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.utilities: Reading/merging 4 contig annotation files
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.utilities: merged annotations shape: (13923, 15)
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.utilities: superkingdom filtered to bacteria taxonomy. shape: (5959, 15)
[06/11/2024 11:10:26 AM INFO] root: Selected clustering method: dbscan
[06/11/2024 11:10:26 AM INFO] autometa.binning.recursive_dbscan: Using dbscan clustering method
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.recursive_dbscan: Using ranks: superkingdom, phylum, class, order, family, genus, species
[06/11/2024 11:10:26 AM INFO] autometa.binning.recursive_dbscan: Examining superkingdom: 1 unique taxa (5,959 contigs)
[06/11/2024 11:10:26 AM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: superkingdom : bacteria : (5959, 15)
Traceback (most recent call last):
  File "/media/microviable/d/miniconda3/envs/autometa_env/bin/autometa-binning", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 882, in main
    main_out = taxon_guided_binning(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 660, in taxon_guided_binning
    clusters_df = get_clusters(
                  ^^^^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 495, in get_clusters
    clustered_df, unclustered_df = clusterer(
                                   ^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 190, in recursive_dbscan
    if median_completeness >= best_median:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "missing.pyx", line 419, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous
chasemc commented 4 weeks ago

It looks like the hotfix-pandas-na branch wasn't installed because line 190, in recursive_dbscan is the old line 190 in the log you pasted.

Make sure to do the pip install git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na after activating the conda environment if you didn't

e.g.

conda activate /media/microviable/d/miniconda3/envs/autometa_env
pip install git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na

Note: With this update I'm getting more and better clusters than the unit test data that I have access to (we're looking to that in the meantime).

imonteroo commented 4 weeks ago

I did, but nothing better

(autometa_env) microviable@microviable:~$ pip install git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na
Collecting git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na
  Cloning https://github.com/KwanLab/Autometa.git (to revision hotfix-pandas-na) to /tmp/pip-req-build-12x6zl4f
  Running command git clone --filter=blob:none --quiet https://github.com/KwanLab/Autometa.git /tmp/pip-req-build-12x6zl4f
  Running command git checkout -b hotfix-pandas-na --track origin/hotfix-pandas-na
  Cambiado a nueva rama 'hotfix-pandas-na'
  Rama 'hotfix-pandas-na' configurada para hacer seguimiento a la rama remota 'hotfix-pandas-na' de 'origin'.
  Resolved https://github.com/KwanLab/Autometa.git to commit f7f99ea7d9c644e7fd963a5b00e7b3a3618de1c1
  Preparing metadata (setup.py) ... done
(autometa_env) microviable@microviable:~$ autometa-binning     --kmers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.bacteria.kmers.embedded.tsv     --coverages /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.coverages.tsv     --gc-content /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.gc.content.tsv     --markers /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.markers.tsv     --clustering-method dbscan     --completeness 20     --purity 95     --cov-stddev-limit 25     --gc-stddev-limit 5     --taxonomy /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.taxonomy.tsv     --output-binning /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.binning.tsv     --output-main /media/microviable/g/test/output/TeamB018A/autometa/TeamB018A.main.tsv     --starting-rank superkingdom     --rank-filter superkingdom     --rank-name-filter bacteria
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.utilities: Reading/merging 4 contig annotation files
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.utilities: merged annotations shape: (13923, 15)
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.utilities: superkingdom filtered to bacteria taxonomy. shape: (5959, 15)
[06/12/2024 09:09:52 AM INFO] root: Selected clustering method: dbscan
[06/12/2024 09:09:52 AM INFO] autometa.binning.recursive_dbscan: Using dbscan clustering method
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.recursive_dbscan: Using ranks: superkingdom, phylum, class, order, family, genus, species
[06/12/2024 09:09:52 AM INFO] autometa.binning.recursive_dbscan: Examining superkingdom: 1 unique taxa (5,959 contigs)
[06/12/2024 09:09:52 AM DEBUG] autometa.binning.recursive_dbscan: Examining taxonomy: superkingdom : bacteria : (5959, 15)
Traceback (most recent call last):
  File "/media/microviable/d/miniconda3/envs/autometa_env/bin/autometa-binning", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 882, in main
    main_out = taxon_guided_binning(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 660, in taxon_guided_binning
    clusters_df = get_clusters(
                  ^^^^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 495, in get_clusters
    clustered_df, unclustered_df = clusterer(
                                   ^^^^^^^^^^
  File "/media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py", line 190, in recursive_dbscan
    if median_completeness >= best_median:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "missing.pyx", line 419, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous
chasemc commented 3 weeks ago

My bad, the package version isn't bumped in the branch yet so you need to add --force-reinstall which should work pip install --force-reinstall git+https://github.com/KwanLab/Autometa.git@hotfix-pandas-na

If the install is successful

head -n191 /media/microviable/d/miniconda3/envs/autometa_env/lib/python3.12/site-packages/autometa/binning/recursive_dbscan.py | tail -n1

should show

else:

and not:

best_median = median_completeness

imonteroo commented 3 weeks ago

Thank you so much. It works

Sidduppal commented 3 weeks ago

It should be fixed in the latest update v2.2.3 #361