Issue with statistical test

Here https://github.com/algorithmwatch/monitoringinstagram/tree/master/analysis#question-2-label-analysis a very significant effect is found. I think this is due to assuming that the created and encountered posts are independent and identically distributed (IID). As shown later, the encounters differ per donor, thus the IID assumption is violated. Violating the IID assumption makes the statistical test unreliable. I think it would be more meaningful to perform the test on data aggregated at the donor level.

PS: Thank you for making this analysis transparant!

algorithmwatch / 2020-monitoring-instagram-analysis

Issue with statistical test #1