Closed astrofrog closed 11 years ago
I worry about the memory overhead of this. Can you benchark how this affects, say, a dendrogram of a noisy 256^3 dataset, with about 100 structures after pruning?
On Monday, July 29, 2013, Thomas Robitaille wrote:
@ChrisBeaumont https://github.com/ChrisBeaumont - when making the tutorial movie, I found it useful to keep track of structures that are removed at the end of the tree computation. Do you think this would be ok
to have in the main code?
You can merge this Pull Request by running
git pull https://github.com/astrofrog/dendro-core discarded-structures
Or view, comment on, or merge it at:
https://github.com/dendrograms/dendro-core/pull/64 Commit Summary
- Keep discareded structures in a special attribute
File Changes
- M astrodendro/dendrogram.pyhttps://github.com/dendrograms/dendro-core/pull/64/files#diff-0(6)
Patch Links:
Good point - I'll try and run the test you suggest, but I agree it's not something sensible to have by default (though of course the memory won't be greater than the peak memory required for the computation since we aren't creating new objects in memory, just preventing them from being discarded). What about making it an option to compute
(off by default). Or is it too specialized?
I think the overhead might be larger than this (since structures are discarded throughout compute, and each one has a list of indices/values).
I haven't ever needed this -- can you think of good use cases beyond your needs for the demo video? If not, I vote for making this an off-by-default, un-emphasized part of the API
On Monday, July 29, 2013, Thomas Robitaille wrote:
Good point - I'll try and run the test you suggest, but I agree it's not something sensible to have by default (though of course the memory won't be greater than the peak memory required for the computation since we aren't creating new objects in memory, just preventing them from being discarded). What about making it an option to compute (off by default). Or is it too specialized?
— Reply to this email directly or view it on GitHubhttps://github.com/dendrograms/dendro-core/pull/64#issuecomment-21715324 .
@ChrisBeaumont - actually this currently only keeps the orphan structures that were discarded at the end. The idea is not to keep all structures that are e.g. merged in with other structures and discarded as independent structures, but only the structures that get discarded at the end. This ensures that every pixel in the data appears only once - in the trunk, or in this special list of discarded structures. This is why it doesn't add any extra memory - does this makes sense? Too specialized? Or should it have a better name?
Ah, you're right then -- memory overhead isn't a huge issue.
Can you elaborate what you want this for? Another approach to accomplish what you want without adding to the API: you could extract the final structure decimation step into its own private method, that compute calls. Then, make your own subclass and override this method to save the structures.
I think the compute method is too long right now anyways, so this refactoring would be useful on its own.
Unless you think saving (some of the) discarded structures is generally useful, I prefer this approach over another kwarg.
The use case is currently drawing the red lines on the movie I sent you :) Each frame shows an actual fully computed dendrogram, so the red lines are structures that don't appear by default unless I keep them in this way.
However, I like your idea of refactoring compute
to split up the steps. This would be a prune_orphans
method that I could override.
Anyway, this is not urgent, as my local fork is fine for making the movie, but I agree the refactoring makes sense.
I'm going to close this since we want a different approach.
@ChrisBeaumont - when making the tutorial movie, I found it useful to keep track of structures that are removed at the end of the tree computation. Do you think this would be ok to have in the main code?