RossBoylan / mccli

0 stars 0 forks source link

Expand correlation options #8

Open RossBoylan opened 4 years ago

RossBoylan commented 4 years ago

The current options for handling correlation are limited.

1) Correlation is either 0 or the maximum possible (perfect for normal). Allow partial correlations.

2) Some correlations that were thought to be present are absent. Specifically, in row correlation different columns in the same line are correlated. But if men and women are on 2 adjacent lines, there is no correlation between the men and the women. Such correlation is desired. Note this is not a problem for block correlation in which each age group is on its own line.

3) It would be good to have, as earlier versions of this system did have, an ability to correlate larger groups of variables, e.g., all cost variables. Mechanically, the same machinery that fixes 2) will likely work for this as well.

Since 2 and 3 represent regressions (at least from expectations) they could be considered bugs.

How to handle partial correlations is likely to require hard thought about how to accomplish it mechanically (hint: use copulas) and how to set a reasonable prior. My understanding is that the difficulty of supporting any particular intermediate correlational assumption has been a major inhibitor of using them. But "I don't think I can defend a particular intermediate assumption, and so I'm going to make a totally indefensible extreme assumption" just doesn't seem like a compelling argument. And the Bayesians have available models for unknown, uncertain correlations.

RossBoylan commented 4 years ago

Recent experiments suggest men and women on adjacent lines are correlated, contra my code analysis expressed in point 2 above. See https://github.com/RossBoylan/mccli/issues/4#issuecomment-647787448 for details.