dilyabareeva / quanda

A toolkit for quantitative evaluation of data attribution methods.
https://quanda.readthedocs.io
MIT License
33 stars 0 forks source link

WiP: Datasets reworked #61

Closed gumityolcu closed 5 months ago

gumityolcu commented 5 months ago

utils.datasets.toy_datasets is created and it currently includes base, label_poisoning, sample_perturbation, label_grouping.

Base Class The most general class. One thing needs explaining: the parameters p and subset_idx both determine which datapoints will be effected by sample_fn and label_fn

subset_idx:

p: determines the probability with which each datapoints filtered by subset_idx will be effected. This is computed during the initalization and if a datapoint is effected, it is always effected.

So for example, for grouping labels we give subset_idx=None and p=1.0 to the base class ( effects all datapoints with certainty)

For label poisoning we give subset_idx=None and p=some value. Effects a randomly selected subset of the training data-

for sample perturbation (changing x in however way you want to), i left these two open to user choice. For CleverHans or Backdoor or Shortcut detection, we will give subset_idx = integer (a certain class) and p=some value. Effects a random subset of the inmages from a single class.

No tests, no guarantees, work in progress

gumityolcu commented 5 months ago

This was a great read. Learned a lot about python. When are you publishing this? šŸ¤“

dilyabareeva commented 5 months ago

This was a great read. Learned a lot about python. When are you publishing this? šŸ¤“

I don't know what you mean @gumityolcu exactly, but I will just assume that you are being sarcastic šŸ˜„

I wasn't quite done with this yet: we now have Grouped Datasets twice and Subclass Detection twice in main, there were some conflict between the new and the old versions. Will push an update soon.