diazrenata / isds

Analysis of individual size distributions (of mammals)
MIT License
0 stars 0 forks source link

Perfect vs good #7

Closed diazrenata closed 4 years ago

diazrenata commented 4 years ago
diazrenata commented 4 years ago

After a lot of messing around, I think I have a method I like?????????? V cautiously.

I'm basically trying to come up with an analytical method that achieves mathematically what I, a human, am doing intuitively when I look at an ISD.

See https://github.com/diazrenata/isds/blob/empty-id/analysis/turns_etc.md

So what I'm doing is.

  1. Apply some kind of smoother to the ISD. I don't think it matters too much whether you use a GMM or a kernel. At the moment I prefer the kernel, because with a little bit of bandwidth tinkering you can cut down on the ridiculous squiggles. GMMs tend to have a lot of squiggles. You can also cut down on this via reducing the allowed number of component gaussians (but then it goes to the max allowed, for uniform at least). Reducing the number of squiggles isn't too p-hacky because we're no longer focused on the number of turns as our metric.
  2. Integrate the smoother so you have a probability density for each possible weight value. The thing I'm looking at seems like the empirical weights come in units of 1, so I'm integrating with units width 1. You could use something else.
  3. Starting with the highest density weight values, collect weight values until you have collected some percent - 95-99, say - of the density. Categorize each weight value as part of that wax ball or not.
  4. How many discontinuous chunks of the size spectrum do you cover with 95% (or whatever) of the probability density? If there are multiple important chunks with a region of low probability between them, you'll need both chunks but you won't scrape all the way down to the valley between. If it's all one chunk, with some squiggles at the top (see the uniform ones), you'll only ever be adding to the same chunk despite any number of squiggles. Your qualitative results will be slightly sensitive to the cutoff % you choose, and the choice of smoother. But, looking at the plots I threw together just now, this generally does not offend my intuition.