dendrograms / astrodendro

Generate a dendrogram from a dataset
https://dendrograms.readthedocs.io/
Other
37 stars 38 forks source link

Ensure that results from repeated runs are deterministic #80

Closed astrofrog closed 10 years ago

astrofrog commented 10 years ago

Here's the output of a simple script:

from astrodendro import Dendrogram
from astropy.io import fits
image = fits.getdata('L1448_13CO.fits')
d = Dendrogram.compute(image, min_value=1.2, min_delta=0.2, min_npix=10, verbose=True)
print(list(d.all_structures)[0].indices())

The output is not always the same which is going to cause issues if users try and reproduce bugs. Also, structure IDs may not always refer to the same structures, which would be an issue. There are several places where dictionaries are used, so this must be what is causing the differences:

mac-robitaille2:dendro-core tom$ python test.py 
Generating dendrogram using 78,465 of 6,637,050 pixels (1.1822270436413769% of data)
[========================================>] 100%
(array([223, 222, 224, 217, 221, 217, 217, 221, 222, 222, 221, 219, 223,
       220, 223, 219, 222, 220, 220, 220, 223, 221, 222, 218, 217, 222,
       222, 220, 222, 220, 221, 219, 221, 221, 221, 221, 220, 221, 220,
       224, 219, 221, 221, 221, 222, 222, 220, 222, 220, 218, 218, 219,
       219, 218, 221, 221, 223, 223, 221, 219, 220, 224, 224, 225, 221,
       223, 223, 219, 221, 220, 221, 218, 220, 223, 224, 216, 219, 220, 216]), array([44, 44, 45, 44, 42, 44, 44, 42, 46, 43, 42, 43, 44, 45, 44, 43, 45,
       43, 43, 43, 44, 45, 44, 43, 43, 45, 45, 44, 45, 44, 43, 43, 43, 43,
       43, 43, 44, 43, 44, 45, 43, 45, 45, 45, 44, 44, 45, 45, 43, 44, 44,
       44, 44, 44, 44, 46, 45, 45, 44, 44, 42, 44, 44, 46, 44, 45, 43, 44,
       44, 42, 44, 44, 43, 46, 46, 43, 45, 43, 43]), array([81, 79, 80, 78, 85, 79, 80, 84, 80, 81, 83, 81, 80, 79, 79, 80, 78,
       84, 82, 81, 82, 82, 81, 82, 79, 80, 81, 81, 82, 80, 84, 83, 83, 82,
       81, 80, 79, 79, 78, 79, 82, 78, 79, 81, 82, 80, 81, 79, 80, 81, 80,
       81, 82, 79, 80, 79, 79, 80, 79, 79, 82, 79, 81, 79, 81, 81, 82, 80,
       82, 84, 78, 78, 79, 80, 79, 78, 79, 78, 79]))
mac-robitaille2:dendro-core tom$ python test.py 
Generating dendrogram using 78,465 of 6,637,050 pixels (1.1822270436413769% of data)
[========================================>] 100%
(array([156, 156, 154, 156, 156, 154, 156, 154, 154, 156, 155, 153, 154,
       155, 154, 156, 155, 154, 153, 157, 158, 158, 154, 155, 153, 154,
       156, 155, 155, 154, 156, 154, 153, 156, 153, 154, 153, 154, 155,
       155, 155, 155, 153, 155, 153, 155, 155, 155, 156, 153, 158, 158,
       155, 155, 155, 155, 153, 155, 156, 156, 157, 154, 157, 154, 154,
       154, 155, 155, 155, 155, 155, 154, 154, 154, 153, 152, 155, 154,
       155, 154, 155, 158, 158, 156, 156, 155, 157, 157, 156, 155, 157,
       157, 156, 155, 154, 155, 155, 154, 154, 155, 158, 154]), array([75, 78, 78, 78, 73, 77, 75, 80, 77, 75, 78, 73, 78, 76, 73, 73, 81,
       77, 79, 74, 75, 76, 77, 76, 76, 78, 74, 78, 81, 76, 80, 78, 78, 74,
       79, 78, 78, 77, 78, 77, 77, 72, 77, 72, 76, 77, 78, 78, 81, 76, 76,
       76, 76, 75, 74, 79, 77, 79, 76, 79, 75, 76, 73, 79, 79, 79, 75, 75,
       75, 79, 75, 79, 79, 80, 77, 78, 80, 74, 73, 79, 80, 76, 76, 76, 77,
       73, 76, 76, 77, 73, 78, 76, 77, 74, 75, 74, 76, 75, 75, 76, 75, 78]), array([37, 34, 37, 33, 37, 36, 36, 34, 35, 34, 32, 37, 33, 34, 37, 38, 35,
       33, 35, 37, 37, 37, 34, 39, 35, 39, 37, 35, 34, 37, 34, 35, 36, 39,
       33, 36, 35, 39, 36, 34, 35, 39, 35, 40, 37, 39, 34, 33, 34, 36, 38,
       39, 38, 34, 36, 36, 34, 37, 37, 33, 37, 35, 37, 34, 35, 36, 38, 37,
       36, 38, 35, 37, 38, 35, 36, 36, 34, 37, 39, 33, 35, 40, 41, 36, 33,
       36, 37, 38, 34, 38, 33, 41, 35, 39, 38, 37, 36, 37, 36, 33, 41, 34]))
astrofrog commented 10 years ago

One way to achieve this is to use ordered dictionaries - unfortunately, these were not available in Python 2.6.

astrofrog commented 10 years ago

The attached code does the trick.

@ChrisBeaumont - what do you think?

ChrisBeaumont commented 10 years ago

Maybe make sorted_by_idx _sorted_by_idx? Otherwise, good to me

astrofrog commented 10 years ago

Done - I need to try and figure out if and how to add a test for this.

astrofrog commented 10 years ago

I think a test is going to be hard, precisely because it's non-deterministic, so let's forget about a test for now.

keflavich commented 10 years ago

As an aside - ordered dictionaries are included in astropy.extern. Since astrodendro requires astropy anyway, you could use their ordereddict structure.

astrofrog commented 10 years ago

I tried that and there was a reason why it didn't work but I can't remember...

astrofrog commented 10 years ago

@ChrisBeaumont - fixed the unnecessary list()

ChrisBeaumont commented 10 years ago

Ready to merge