dendrograms / astrodendro

Generate a dendrogram from a dataset
https://dendrograms.readthedocs.io/
Other
37 stars 38 forks source link

Keep discareded structures in a special attribute #64

Closed astrofrog closed 11 years ago

astrofrog commented 11 years ago

@ChrisBeaumont - when making the tutorial movie, I found it useful to keep track of structures that are removed at the end of the tree computation. Do you think this would be ok to have in the main code?

ChrisBeaumont commented 11 years ago

I worry about the memory overhead of this. Can you benchark how this affects, say, a dendrogram of a noisy 256^3 dataset, with about 100 structures after pruning?

On Monday, July 29, 2013, Thomas Robitaille wrote:

@ChrisBeaumont https://github.com/ChrisBeaumont - when making the tutorial movie, I found it useful to keep track of structures that are removed at the end of the tree computation. Do you think this would be ok

to have in the main code?

You can merge this Pull Request by running

git pull https://github.com/astrofrog/dendro-core discarded-structures

Or view, comment on, or merge it at:

https://github.com/dendrograms/dendro-core/pull/64 Commit Summary

  • Keep discareded structures in a special attribute

File Changes

  • M astrodendro/dendrogram.pyhttps://github.com/dendrograms/dendro-core/pull/64/files#diff-0(6)

Patch Links:

astrofrog commented 11 years ago

Good point - I'll try and run the test you suggest, but I agree it's not something sensible to have by default (though of course the memory won't be greater than the peak memory required for the computation since we aren't creating new objects in memory, just preventing them from being discarded). What about making it an option to compute (off by default). Or is it too specialized?

ChrisBeaumont commented 11 years ago

I think the overhead might be larger than this (since structures are discarded throughout compute, and each one has a list of indices/values).

I haven't ever needed this -- can you think of good use cases beyond your needs for the demo video? If not, I vote for making this an off-by-default, un-emphasized part of the API

On Monday, July 29, 2013, Thomas Robitaille wrote:

Good point - I'll try and run the test you suggest, but I agree it's not something sensible to have by default (though of course the memory won't be greater than the peak memory required for the computation since we aren't creating new objects in memory, just preventing them from being discarded). What about making it an option to compute (off by default). Or is it too specialized?

— Reply to this email directly or view it on GitHubhttps://github.com/dendrograms/dendro-core/pull/64#issuecomment-21715324 .

astrofrog commented 11 years ago

@ChrisBeaumont - actually this currently only keeps the orphan structures that were discarded at the end. The idea is not to keep all structures that are e.g. merged in with other structures and discarded as independent structures, but only the structures that get discarded at the end. This ensures that every pixel in the data appears only once - in the trunk, or in this special list of discarded structures. This is why it doesn't add any extra memory - does this makes sense? Too specialized? Or should it have a better name?

ChrisBeaumont commented 11 years ago

Ah, you're right then -- memory overhead isn't a huge issue.

Can you elaborate what you want this for? Another approach to accomplish what you want without adding to the API: you could extract the final structure decimation step into its own private method, that compute calls. Then, make your own subclass and override this method to save the structures.

I think the compute method is too long right now anyways, so this refactoring would be useful on its own.

Unless you think saving (some of the) discarded structures is generally useful, I prefer this approach over another kwarg.

astrofrog commented 11 years ago

The use case is currently drawing the red lines on the movie I sent you :) Each frame shows an actual fully computed dendrogram, so the red lines are structures that don't appear by default unless I keep them in this way.

However, I like your idea of refactoring compute to split up the steps. This would be a prune_orphans method that I could override.

Anyway, this is not urgent, as my local fork is fine for making the movie, but I agree the refactoring makes sense.

astrofrog commented 11 years ago

I'm going to close this since we want a different approach.