gda-score / code

Tools for generating General Data Anonymity Scores (www.gda-score.org)
MIT License
7 stars 1 forks source link

Derive simple plot from utility data structure #8

Closed yoid2000 closed 5 years ago

yoid2000 commented 5 years ago

We need to add the utility measure to the diagram.

At https://gist.github.com/yoid2000/e7fb720d38552572307cfc029a58009f you can find an example the json file for a single utility measure.

This is just one utility measure. In fact, a given diagram will have multiple utility measures, all of which need to be reduced down to two values, an accuracy value and a coverage value. (Note that we'll later have plotting methods that provide much more detail, but for now I want to produce this reduced score.)

For this issue, I'd like you to build a method that takes as input a list of file locations (paths) containing json files like the one above. As output it produces a python data structure containing an accuracy score and a coverage score.

The accuracy score should be the average meanSquareError value within all simpleRelativeErrorMetrics items in all files.

The coverage score should be the average of all coveragePerCol values.

Because different files may have different structures, for now please find these values by simply doing a full scan of the data structure. In other words, don't pre-suppose any specific structure in these files. Rather just walk through all lists and dicts looking for the above values.

I'm assigning this to @anirbanGhosh1512. However, @srnb please note that this is an example of the sort of thing we'll be wanting to do with the utility scores, and that we clearly will need better organization of this information.

@anirbanGhosh1512 as always let me know if you have questions.

AnirbanGhosh1512 commented 5 years ago

Start Working on it.

AnirbanGhosh1512 commented 5 years ago

Hello Prof. Paul, One quick question, does the accuracy and coverage score are going to be generated based on the number of columns used? As the sample .json consist both doublecolumnscores and single column scores.

Regards, Anirban

yoid2000 commented 5 years ago

No number of columns won't matter.

PF

On Mon, Oct 29, 2018 at 6:42 PM AnirbanGhosh1512 notifications@github.com wrote:

Hello Prof. Paul, One quick question, does the accuracy and coverage score are going to be generated based on the number of columns used? As the sample .json consist both doublecolumnscores and single column scores.

Regards, Anirban

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gda-score/code/issues/8#issuecomment-434006404, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD-qZzLXJFd2yvXCsFulqJW7rxcKq31ks5upz3jgaJpZM4XzmyR .