Open JTFouquier opened 8 years ago
Wrote a script so I can run the gini analysis after downloading the .xml files. Far from user-friendly, but it does the job:
Results here: https://docs.google.com/spreadsheets/d/1hYDU_oIfLWqunj_LUTa6SKJO0jqVdEMpPNKaoGGSiQc/edit#gid=0
Original comment by: gtsueng
Other potentially useful metrics:
For progress:
Date, Total Docs submitted (by group), Total Docs available, Percent complete (by group)
For Daily activity:
Date , Total Docs submitted on that date
Date, Talk posts submitted on that date
Original comment by: gtsueng
Is this an issue or should it be flagged as an discussion?
Lorenz Curve/Gini coefficient: In this application, they quantify how the work effort is distributed among the volunteers. Here's the curve for one particular survey's data in the latest Galaxy Zoo (we have 5 different data sets at this point and this is the one I'm reducing right now): To plot it you sort your volunteers from [done least # of classifications] to [most # classifications] and then plot a cumulative curve of the number of classifications as you add each user. If everyone has done the same number of classifications each then the curve is a straight line (the dashed line); when a smaller number of people do more classifications apiece you get a curve that pulls away from the line. To compute the Gini coefficient you calculate 1 - (area_under_curve/area_under_dashed_line)... so higher numbers mean the curve is pulled farther from the line, i.e. there's a smaller group who do more work apiece