Improved analysis API to make it easier to use programmatically

dendrograms / astrodendro

Generate a dendrogram from a dataset

https://dendrograms.readthedocs.io/

Other

37 stars 38 forks source link

Improved analysis API to make it easier to use programmatically #39

Closed astrofrog closed 11 years ago

astrofrog commented 11 years ago

The current analysis API produces a catalog, but this is not always convenient to use in a programmatic way. With minimal changes to the current API, I've make PPStatistic and PPVStatistic public, and have added documentation on using them - in particular, see the example for why this kind of API is useful. Here's a preview of the documentation page:

https://dendrograms-astrofrog-v2.readthedocs.org/en/latest/catalog.html

Note that there was a bug in the definition of the major/minor axes - the second moment needs to be multiplied by two to give the actual major/minor axes. (EDIT: should be a factor of 2.35 for sigma -> fwhm))

I'd like to make the statistics properties rather than methods of the Statistic classes. Is there any reason not to do this?

astrofrog commented 11 years ago

cc @ChrisBeaumont

ChrisBeaumont commented 11 years ago

I wouldn't call the maj/min properties "bugs", just "sigmas". It's not clear to me what is most standard convention for size is (semi-major? major? 2.35 * semimajor? custom FWHM based on intensity distribution? root area?). Erik uses 1.91 to account for central concentration, referring to a paper by McKee et al (see the original dendrogram paper). I kind of like the current definition, since it is explicit -- in fact, I'd rather change the name to semimajor. If we use the factor of 2 and someone tries to apply Erik's correction, they have to remember to divide his correction by 2. That's not obvious. And the 2.35 is only really a FWHM if the blob is gaussian.

The PPV/PPP Statistic methods could be made into properties, but the lower level Scalar / Vector statistics classes should not -- some of them take additional arguments

astrofrog commented 11 years ago

In terms of convention, in FITS, BMAJ and BMIN are FWHM, not sigma. Currently, the implementation assumes these are sigma which is incorrect when doing the deconvolution (I don't think we can expect users to do something like bmaj=header['BMAJ'] / 2.35. The reason I would find it confusing for it to not be FWHM for sky_maj/min currently is because bmaj/bmin should be FWHM, and the names sky_maj and sky_min are not too different, so it would be confusing if they were defined differently.

Now if we change to semimajor and semiminor, the issue is, is this sigma, or (as one would expect from a name like semi-major), the HWHM?

One option would be sky_major_sigma and sky_minor_sigma, or sky_major_mom2 and sky_minor_mom2 - what do you think?

For bmaj/bmin, let's discuss it in #40 and deal with it separately.

astrofrog commented 11 years ago

@ChrisBeaumont - I've changed sky_min to sky_minor_sigma and sky_maj to sky_major_sigma, and have also changed the PPStatistic and PPVStatistic classes to use properties. I think all the tests should pass - is this good to go, or do you have a better suggestion for renaming sky_minor/major_sigma?

ChrisBeaumont commented 11 years ago

This looks good to me -- I realize that sky_major_sigma is a little verbose. In IPython, does obj.property? show the docstring of a property getter? If so, then I'd be fine using another name or definition, as long as it's spelled out clearly in the docstring.

astrofrog commented 11 years ago

@ChrisBeaumont - it does show the docstring, though it also shows the class docstring for property, so it's a bit messy. I think for now it's better to be explicit even if a little verbose (I actually don't mind verbose). Maybe I can try and see if there is a better name when I also consider renaming dx and dv in #41.

astrofrog commented 11 years ago

I'm going to merge this and add a note to #41 about potentially renaming the major/minor sigma properties.