PrincetonUniversity / DP_GP_cluster

BSD 3-Clause "New" or "Revised" License
77 stars 26 forks source link

Plotting failed for big data set #31

Closed stborowski closed 3 years ago

stborowski commented 3 years ago

Hi,

processing a big data set with DP_GP_cluster (memory foot print of about 200GB), we obtain the text output files as expected. However, plotting fails with an error message

WARNING: skipping heatmap plot generation, too many dendrogram recursions for scipy to handle
Traceback (most recent call last):
  File "~/.local/bin/DP_GP_cluster.py", line 684, in <module>
    core.save_posterior_similarity_matrix_key([gene_names[idx] for idx in sim_mat_key], args.output_path_prefix)

With input filtered to 10% plotting succeeds but results are not significant anymore. We have tried a couple of measures to get plots from the full data set as well, without success:

Adding plt.switch_backend('agg') after import matplotlib.pyplot as plt as recommended in some posts does not show any effect.

Postprocessing the text output files by option --post_process hangs in transposing a matrix according to trace output.

Looking into the sources, the abort is probably caused by a scipy call Z = sch.dendrogram(Y, orientation='left', link_color_func=lambda x: 'black' ) failed in ~/.local/lib/python2.7/site-packages/DP_GP/plot.py. Adding options truncate_mode='level',p=30 to limit the tree processed does not show any effect either.

Is there any way to work around this? We use Python 2.7.16 with scipy 1.2.1 as required. Thank you!

stborowski commented 3 years ago

Hi,

the abort in scipy function scipy.cluster.hierarchy.dendrogram() was indeed due to missing recursion depth. Setting the recursion limit from default 1000 to 1000000 with sys.setrecursionlimit() solved the issue.