dendrograms / astrodendro

Generate a dendrogram from a dataset
https://dendrograms.readthedocs.io/
Other
37 stars 36 forks source link

Performance issues with hdf5 #130

Open keflavich opened 9 years ago

keflavich commented 9 years ago

I've been working on cubes with shape ~500x250x1100 containing ~500-4000 dendrogrammed sources. When I save the dendrograms, they end up in ~150-250 MB hdf5 files. Loading the dendrogram from the hdf files is exceedingly slow, ~a few minutes on a solid state drive. This seems excessive to me; is my intuition incorrect or might major performance improvements be possible?

keflavich commented 9 years ago

See https://github.com/keflavich/astrodendro/tree/faster_dendro_parser for a fix. @astrofrog or @ChrisBeaumont - could you turn this Issue into a PR? I don't believe I can without repo write access.

keflavich commented 9 years ago

(the performance improvement is to loop over only the valid indices in Python; the data selection is done with numpy)

ChrisBeaumont commented 9 years ago

Opened via #131