Closed Statgnome closed 10 months ago
Interesting question, happy to discuss it here.
To use weighted data correctly, there are a few places where the library would need to be updated: the calculation of phik, the significance evaluation, and the outlier significances would be all be affected.
For (only) the calculation of phik there are three important things I can think of:
I'm quite sure the phi_k calculation can be made to work, but it needs a bit of effort/study to get it right. (Eg. deriving the right max chi^2 formula.) If you're interested in this, then I'm happy to pick it up together though. Let me know!
It would be great to work on this, but at the moment I may have too much going on. I'll try to get back to you about it if I become more available. I'm a big fan of the work you've already done, and we are using phik now for unweighted analyses to inform model selection. Thanks!
Glad to read that phik is useful for you. If you have time/interest later on then don't hesitate to reach out, and let's have a go at it.
If I understand things correctly, because of how phik solves for rho, to use weighted data with phik one needs to be able to supply weighted contingency tables to key functions. If the data is merely weighted in the case of continuous data, it will still count as 1 within the uniform bin. Has their been any consideration to using weighted data? Can you provide any guidance on how to use weighted data for any data case with phik?
Since this is not really an issue with the code or implementation, if other channels of communication are preferred, please let me know. I am trying to integrate phik into some analytics work, but weighting is very important to how the data is understood where I work.