SegataLab / hclust2

Hclust2 is a handy tool for plotting heat-maps with several useful options to produce high quality figures that can be used in publication.
MIT License
11 stars 3 forks source link

ValueError: The condensed distance matrix must contain only finite values #3

Open nitishnih opened 4 years ago

nitishnih commented 4 years ago

Hello,

This is error is related to #1. Once that issue was solved and @fbeghini closed it, I reinstalled hclust2 in a conda environment, as follows:

conda create -n hclust
conda activate hclust2
conda config --env --add channels bioconda
conda install --yes hclust2

Using the same merged abundance file mentioned in #1 (created using metaphlan3), I ran the following command:

$ hclust2.py --in merged_abundance_table.txt -l --out heatmap.png

And got the following error:

Traceback (most recent call last):
  File "/opt/conda/envs/hclust2/bin/hclust2.py", line 825, in <module>
    hclust2_main()
  File "/opt/conda/envs/hclust2/bin/hclust2.py", line 805, in hclust2_main
    cl.fhcluster()
  File "/opt/conda/envs/hclust2/bin/hclust2.py", line 386, in fhcluster
    self.fhclusters = sph.linkage(self.f_dm, method=self.args.flinkage)
  File "/opt/conda/envs/hclust2/lib/python3.8/site-packages/scipy/cluster/hierarchy.py", line 1057, in linkage
    raise ValueError("The condensed distance matrix must contain only "
ValueError: The condensed distance matrix must contain only finite values.

This being a different error than before, I assume that hclust2 on bioconda channel had been updated to fix issue #1. In case I was wrong, I followed the advise @fbeghini posted in #1 to manually remove the first line (containing the string #mpa_v30_CHOCOPhlAn_201901) and the column, NCBI_tax_id, but got the same error.

Looking over the matrix in the merged abundance file, it is not immediately clear why the matrix would contain non-finite values.

fbeghini commented 4 years ago

There're a couple of combinations of rows having lot of 0s and when calculating the correlation both of the numerator and denominator being 0 will result in NaN.

nitishnih commented 4 years ago

Thanks for the explanation @fbeghini. Does this mean hclust2 cannot process this merged abundance file created by metaphlan? Can this be handled in a different way by the program or do users need a manual workaround?

fbeghini commented 4 years ago

I'd filter out only the entries on the species level first, and then maybe try a different method for species/feature distance calculation

BSteel93 commented 3 years ago

Hi there,

I'm having the exact same issue as nitishnih when trying to generate a heat map (using hclust2) from a merged abundance table generated by metaphlan3. I also followed the step for altering the abundance table by removing the header and the NCBI_tax_id column. I was just wondering if this issue had been fixed or resolved? Or is the recommended advice to use an alternative species/feature distance calculation method?

saras224 commented 3 years ago

Kindly Respond to the issue on biobakery help forum regarding the same issue or please explain here itself that how to resolve the error when rows are having 0 values ? how to remove them from the merged_abundance_table_species_table.txt ? @fbeghini

ReneKat commented 2 years ago

I'm having this issue too. What's the fix? I am unable to recreate the heatmap in the example. Even when adding the --no_fclustering and --no_sclustering. Thank you, Rene

scleractinia commented 2 years ago

I am also having this issue

baishengjun commented 2 years ago

It confused me many days. Does someone have any solutions? Thanks, Bai

kazumaxneo commented 2 years ago

I also encountered this error. I was able to run successfully when I turned off clustering (--no_fclustering and --no_sclustering). This error may occur if the samples contain mostly 0's. You can avoid this by adding a very small value (e.g. 0.01) to all samples compared to the data.

EricDeveaud commented 2 years ago

maybee you want to check: https://forum.biobakery.org/t/hclust2-py-error-distance-matrix-finite-values/1732/2

I would appreciate if somebody can tell me if the change is valid or not. Eric

cjfields commented 1 year ago

Just a note that I also see this, including with the example data that comes with this repo using the run.sh script. @EricDeveaud's changes (linked above) do seem to progress past the issue, but I run into another downstream problem in matplotlib:

% ./hclust2.py \
    -i examples/HMP-MetaPhlAn/HMP.species.txt \
    -o HMP.sqrt_scale.png \
    --skip_rows 1 \
    --ftop 50 \
    --f_dist_f correlation \
    --s_dist_f braycurtis \
    --cell_aspect_ratio 9 \
    -s --fperc 99 \
    --flabel_size 4 \
    --metadata_rows 2,3,4 \
    --legend_file HMP.sqrt_scale.legend.png \
    --max_flabel_len 100 \
    --metadata_height 0.075 \
    --minv 0.01 \
    --no_slabels \
    --dpi 300 \
    --slinkage complete
Traceback (most recent call last):
  File "/Users/cjfields/research/biotech/swanson/2022-August-metagenome/src/hclust2/./hclust2.py", line 1244, in <module>
    hclust2_main()
  File "/Users/cjfields/research/biotech/swanson/2022-August-metagenome/src/hclust2/./hclust2.py", line 1240, in hclust2_main
    hm.draw()
  File "/Users/cjfields/research/biotech/swanson/2022-August-metagenome/src/hclust2/./hclust2.py", line 1028, in draw
    im = ax_hm.imshow(
  File "/Users/cjfields/miniforge3/lib/python3.9/site-packages/matplotlib/_api/deprecation.py", line 454, in wrapper
    return func(*args, **kwargs)
  File "/Users/cjfields/miniforge3/lib/python3.9/site-packages/matplotlib/__init__.py", line 1423, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/Users/cjfields/miniforge3/lib/python3.9/site-packages/matplotlib/axes/_axes.py", line 5577, in imshow
    im._scale_norm(norm, vmin, vmax)
  File "/Users/cjfields/miniforge3/lib/python3.9/site-packages/matplotlib/cm.py", line 405, in _scale_norm
    raise ValueError(
ValueError: Passing a Normalize instance simultaneously with vmin/vmax is not supported.  Please pass vmin/vmax directly to the norm when creating it.

Using:

python=3.9.10
matplotlib==3.6.0
numpy==1.23.1
pandas==1.5.0
scipy==1.9.1
setuptools==60.9.3
pollicipes commented 1 year ago

Hi, This worked for me: https://forum.biobakery.org/t/hclust2-py-error-distance-matrix-finite-values/1732 Just modify the script in the __init__ function in line 370 should do it.

Regarding your specific error, I think that if you discard the --minv parameter it should work.

Cheers, J

Jesuk555 commented 6 months ago

I have the same problem, I am using it in a cluster whose system is similar to Linux, do you know how to solve it?