KwanLab / Autometa

Autometa: Automated Extraction of Genomes from Shotgun Metagenomes
https://autometa.readthedocs.io
Other
40 stars 15 forks source link

:bug: in `autometa-binning-ldm`: Missing/incorrect `binning_checkpoints` variable init/assignment #324

Closed evanroyrees closed 10 months ago

evanroyrees commented 1 year ago

Current Behavior

autometa-binning-ldm currently terminates with an UnboundLocalError from trying to assign pd.NA to binning_checkpoints. This can be fixed by initializing binning_checkpoints as an else following if cache: ...

This may be accomplished on line 371

if cache:
    ...
else: # LINE 371
    binning_checkpoints = pd.DataFrame()

Steps to Reproduce

# NOTE --cache is not provided resulting in the error
autometa-binning-ldm  \
    --kmers 5mers.tsv  \
    --coverages coverage.tsv \
    --gc-content gc_content.tsv \
    --markers bacteria.markers.tsv \
    --taxonomy taxonomy.tsv \
    --output-binning "binning.no_metadata.tsv" \
    --output-main "binning.tsv" \
    --clustering-method hdbscan \
    --completeness 20 \
    --purity 95 \
    --cov-stddev-limit 25 \
    --gc-stddev-limit inf \
    --starting-rank superkingdom \
    --rank-filter superkingdom \
    --rank-name-filter bacteria \
    --cpus 20

Expected Behavior

Reference binning_checkpoints without an error so autometa-binning-ldm may proceed per usual.

Environment Information

Autometa docker image: jasonkwan/autometa:2.2.0

i.e. implemented within nextflow process as:

```groovy container "jasonkwan/autometa:2.2.0" ```

Run Information

contents of .command.sh

```bash (autometa) evan@userserver:~/host_associated_work/5c/16ba7f6953f0e3849c31908e2bd3ad$ cat .command.sh #!/usr/bin/env bash autometa-binning-ldm --kmers 5mers.tsv --coverages coverage.tsv --gc-content gc_content.tsv --markers bacteria.markers.tsv --taxonomy taxonomy.tsv --output-binning "lasonolide_producer.autometa_v2.comp20.pur95.cov25.gcinf.binning.no_metadata.tsv" --output-main "lasonolide_producer.autometa_v2.hdbscan.comp20.pur95.cov25.gcinf.binning.tsv" --clustering-method hdbscan --completeness 20 --purity 95 --cov-stddev-limit 25 --gc-stddev-limit inf --starting-rank superkingdom --rank-filter superkingdom --rank-name-filter bacteria --cpus 20 ```

Contents of .command.err

```bash (autometa) evan@userserver:~/host_associated_work/5c/16ba7f6953f0e3849c31908e2bd3ad$ cat .command.err WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. [05/13/2023 06:09:53 AM DEBUG] autometa.binning.utilities: Reading/merging 3 contig annotation files [05/13/2023 06:09:54 AM DEBUG] autometa.binning.utilities: merged annotations shape: (101455, 13) [05/13/2023 06:09:54 AM DEBUG] autometa.binning.utilities: superkingdom filtered to bacteria taxonomy. shape: (68971, 13) [05/13/2023 06:09:54 AM INFO] root: 2,289 sequences contain markers (3.32% of total in binning features table) [05/13/2023 06:09:54 AM INFO] root: Selected clustering method: hdbscan [05/13/2023 06:09:54 AM DEBUG] autometa.binning.large_data_mode: Using canonical ranks: superkingdom, phylum, class, order, family, genus, species [05/13/2023 06:09:54 AM DEBUG] autometa.binning.large_data_mode: Max partition size set to: 10000 [05/13/2023 06:09:54 AM INFO] autometa.binning.large_data_mode: Examining superkingdom: 1 unique taxa (68,971 contigs) [05/13/2023 06:09:55 AM DEBUG] autometa.common.kmers: Transforming k-mer counts using am_clr [05/13/2023 06:10:39 AM DEBUG] autometa.common.kmers: Performing decomposition with PCA (seed 42): 512 to 50 dims [05/13/2023 06:10:46 AM DEBUG] autometa.common.kmers: bhsne: 68971 data points and 50 dimensions [05/13/2023 06:10:46 AM DEBUG] autometa.common.kmers: Performing embedding with bhsne (seed 42) [05/13/2023 06:16:46 AM DEBUG] autometa.binning.large_data_mode: bacteria > max_partition_size (68,971>10,000). skipping [and caching embedding] [05/13/2023 06:16:46 AM DEBUG] autometa.common.kmers: Transforming k-mer counts using am_clr [05/13/2023 06:17:26 AM DEBUG] autometa.common.kmers: Performing decomposition with PCA (seed 42): 512 to 50 dims [05/13/2023 06:17:29 AM DEBUG] autometa.common.kmers: bhsne: 68971 data points and 50 dimensions [05/13/2023 06:17:29 AM DEBUG] autometa.common.kmers: Performing embedding with bhsne (seed 42) Traceback (most recent call last): File "/opt/conda/bin/autometa-binning-ldm", line 8, in sys.exit(main()) File "/opt/conda/lib/python3.9/site-packages/autometa/binning/large_data_mode.py", line 838, in main main_out = cluster_by_taxon_partitioning( File "/opt/conda/lib/python3.9/site-packages/autometa/binning/large_data_mode.py", line 505, in cluster_by_taxon_partitioning binning_checkpoints[rank_name_txt] = pd.NA UnboundLocalError: local variable 'binning_checkpoints' referenced before assignment ```