bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
396 stars 190 forks source link

ValueError: The condensed distance matrix must contain only finite value s. #322

Open paul-bio opened 3 years ago

paul-bio commented 3 years ago

Hi Ursky. I encountered some error in the quant_bin module. I ran assembly with megahit with -l 500 option, and in the binning module i use concoct only. So I have only one bin set A(from concoct). Bin_refinement was done right after binning. As I have only one bin set, bin_refinement was done within 2 minutes(output Final best bins). And with this Final best bins(directory) I ran quant_bin.

_$metaWRAP quant_bins 
-b bin_refinement/binsA
-o bin_quant
-a final_assembly.fasta 
/NFS/users/creo9447/project/typhoon/raw_data/pre50_1.fastq /NFS/users/creo9447/project/typhoon/raw_data/pre50_2.fastq /NFS/users/creo9447/project/typhoon/raw_data/pst50_1.fastq /NFS/users/creo9447/project/typhoon/raw_data/pst50_2.fastq
-t 30_

as you told me, I put 4 raw data(it is trimmed) rather than concatenated 2 reads(forward, reverse)

########################################################################                                                                                                           ################################################
#####                                 MAKING GENOME ABUNDANCE HEATMAP WI                                                                                                           TH SEABORN                                 #####
########################################################################                                                                                                           ################################################

------------------------------------------------------------------------                                                                                                           ------------------------------------------------
-----                                          making heatmap with Seabo                                                                                                           rn                                         -----
------------------------------------------------------------------------                                                                                                           ------------------------------------------------

loading libs...
loading abundance data...
drawing clustermap...
Traceback (most recent call last):
  File "/home/creo9447/miniconda3/envs/metawrap-env/bin/metawrap-scripts                                                                                                           /make_heatmap.py", line 75, in <module>
    draw_clustermap(df, lut)
  File "/home/creo9447/miniconda3/envs/metawrap-env/bin/metawrap-scripts                                                                                                           /make_heatmap.py", line 58, in draw_clustermap
    g = sns.clustermap(df, figsize=(14,8), col_cluster=True, yticklabels                                                                                                           =True, cmap="magma")
  File "/home/creo9447/miniconda3/envs/metawrap-env/lib/python2.7/site-p                                                                                                           ackages/seaborn/matrix.py", line 1301, in clustermap
    **kwargs)
  File "/home/creo9447/miniconda3/envs/metawrap-env/lib/python2.7/site-p                                                                                                           ackages/seaborn/matrix.py", line 1128, in plot
    row_linkage=row_linkage, col_linkage=col_linkage)
  File "/home/creo9447/miniconda3/envs/metawrap-env/lib/python2.7/site-p                                                                                                           ackages/seaborn/matrix.py", line 1021, in plot_dendrograms
    ax=self.ax_row_dendrogram, rotate=True, linkage=row_linkage)
  File "/home/creo9447/miniconda3/envs/metawrap-env/lib/python2.7/site-p                                                                                                           ackages/seaborn/matrix.py", line 747, in dendrogram
    label=label, rotate=rotate)
  File "/home/creo9447/miniconda3/envs/metawrap-env/lib/python2.7/site-p                                                                                                           ackages/seaborn/matrix.py", line 564, in __init__
    self.linkage = self.calculated_linkage
  File "/home/creo9447/miniconda3/envs/metawrap-env/lib/python2.7/site-p                                                                                                           ackages/seaborn/matrix.py", line 628, in calculated_linkage
    return self._calculate_linkage_scipy()
  File "/home/creo9447/miniconda3/envs/metawrap-env/lib/python2.7/site-p                                                                                                           ackages/seaborn/matrix.py", line 603, in _calculate_linkage_scipy
    metric=self.metric)
  File "/home/creo9447/miniconda3/envs/metawrap-env/lib/python2.7/site-p                                                                                                           ackages/scipy/cluster/hierarchy.py", line 1109, in linkage
    raise ValueError("The condensed distance matrix must contain only "
ValueError: The condensed distance matrix must contain only finite value                                                                                                           s.

************************************************************************                                                                                                           ************************************************
*****                           something went wrong with making the hea                                                                                                           tmap. Exiting...                           *****
************************************************************************                                                                                                           ************************************************

can you give me any recommandation?

paul-bio commented 3 years ago

and also bin abundance table.

bin_abundance_table.txt

ursky commented 3 years ago

It looks like something about the "unbinned" cluster is throwing it off - the abundance line states: unbinned nan nan . The nan is not a valid value for pandas, and hence it cant plot it. I don't know what caused it to be nan to begin with (empty fasta file?) but its not normal to include the "unbinned" section of the assembly at this stage of analysis, so I would just remove it and rerun. Also, place the -t flag in your command before the list of files, or it wont be read in.

paul-bio commented 3 years ago

Here is my refinement command,

$metaWRAP -o -A -t 32 -m 60 -c 70 -x 10 --skip-checkm --keep-ambiguous

I don't know what caused un binned section. Anyway in the bin_abundance_table.txt, there were 201 bin in total.

  1. So could I think maximum number of species in two samples are 201?

  2. If that's right, can I just regard this bin table as an OTU table? And do further analysis with this table maybe I could use this file with another visualization program.

ursky commented 3 years ago

Yes, its better to use your own visualization from this point. The bin_abundance_table.txt is like an OTU table, but for MAGs.

paul-bio commented 3 years ago

Hi, Ursky. I solve the problem. In the quant module, I put refined_bin in the -b option. However in the bin_abundance_table, there were unassigned bins. ex bin2, or bin100..... So I removed those bins in the refined_bin directory. And rerun the quant module again, and it worked.

Thanks for advice ursky👍