algorithmwatch / 2020-monitoring-instagram-analysis

Data analysis for the Monitoring Instagram project
https://algorithmwatch.org/en/story/instagram-algorithm-nudity/
GNU General Public License v3.0
2 stars 1 forks source link

Issue with statistical test #1

Open breuderink opened 4 years ago

breuderink commented 4 years ago

Here https://github.com/algorithmwatch/monitoringinstagram/tree/master/analysis#question-2-label-analysis a very significant effect is found. I think this is due to assuming that the created and encountered posts are independent and identically distributed (IID). As shown later, the encounters differ per donor, thus the IID assumption is violated. Violating the IID assumption makes the statistical test unreliable. I think it would be more meaningful to perform the test on data aggregated at the donor level.

PS: Thank you for making this analysis transparant!

n-kb commented 4 years ago

Thanks a lot for this. I'll pass it to our statistician and hopefully we can rework the tests.