genomicsITER / NanoCLUST

NanoCLUST is an analysis pipeline for UMAP-based classification of amplicon-based full-length 16S rRNA nanopore reads
MIT License
106 stars 49 forks source link

Potential bug with `cluster_sel_epsilon` option. Rounded to 0 #95

Open aringeri opened 3 months ago

aringeri commented 3 months ago

Hi, I've been reading through the implementation of NanoCLUST to get some ideas for clustering my own nanopore data.

I have found a potential bug with the cluster_sel_epsilon option at the hdbscan clustering step. The line here: https://github.com/genomicsITER/NanoCLUST/blob/9364ddcc96d7f90c34e97c4baa858835c9b0a943/templates/umap_hdbscan.py#L23

int($params.cluster_sel_epsilon)

My understanding is that the int() function in python will round down fractions to the nearest integer. So although the params.cluster_sel_epsilon may be set to 0.5, int(0.5) will become 0.

This may be misleading if anyone attempt to configure the params.cluster_sel_epsilon option as it will always be rounded down.