Open purbon opened 6 years ago
code has been uploaded to master to load the xing dataset check https://github.com/fair-search/fairsearch-elasticsearch-plugin/blob/master/demo/load-xing-dataset.rb for details.
Also a dummy query used to check manually the operations using this dataset has also been pushed to master, check https://github.com/fair-search/fairsearch-elasticsearch-plugin/blob/master/demo/xing.query
@chatox @tsuehr it would be nice to have a list of query (term, precision, significance and k) with the amount of results expected to be returned for the algorithm?.I am not sure I can find this kind of information from the paper.
This need to be constructed synthetically. Example:
query = hello
doc1 = hello hello hello hello doc2 = hello hello hello bye doc3 = hello hello bye bye doc4 = hello bye bye bye doc5 = bye bye bye bye
Now, by assigning different genres to genre1 ... genre5, one can generate expected result lists in different orderings. This depends on table p.
I suggest not to tie this to the German credit score dataset, but instead do it generically with a synthetic examples such as the one I've shown.
that works for me, we can also do that. Would you be so nice to prepare a dummy test set, including expected number of answers (per protected category) that we can translate into an integration test in the plugin? just to make sure we do the right verifications.
Sure that would be based in some mtable prepared by @tsuehr
@chatox test according to what we have spoken and what you teach us here in this issue has been created and pushed at
will also add next days more edge cases with few protected elements vs lots of protected, etc...
We should create valid test cases based on the XING dataset (link)
This test should cover: