WiP: Datasets reworked - Githubissues

gumityolcu commented 5 months ago

utils.datasets.toy_datasets is created and it currently includes base, label_poisoning, sample_perturbation, label_grouping.

Base Class The most general class. One thing needs explaining: the parameters p and subset_idx both determine which datapoints will be effected by sample_fn and label_fn

subset_idx:

Can be int: then it is the id of the class that is to be effected.
Can be a list or tensor: then it is treated as the ids of samples to be effected
Can be None, which means "effect the whole dataset"

p: determines the probability with which each datapoints filtered by subset_idx will be effected. This is computed during the initalization and if a datapoint is effected, it is always effected.

So for example, for grouping labels we give subset_idx=None and p=1.0 to the base class ( effects all datapoints with certainty)

For label poisoning we give subset_idx=None and p=some value. Effects a randomly selected subset of the training data-

for sample perturbation (changing x in however way you want to), i left these two open to user choice. For CleverHans or Backdoor or Shortcut detection, we will give subset_idx = integer (a certain class) and p=some value. Effects a random subset of the inmages from a single class.

No tests, no guarantees, work in progress

gumityolcu commented 5 months ago

This was a great read. Learned a lot about python. When are you publishing this? 🤓

dilyabareeva commented 5 months ago

This was a great read. Learned a lot about python. When are you publishing this? 🤓

I don't know what you mean @gumityolcu exactly, but I will just assume that you are being sarcastic 😄

I wasn't quite done with this yet: we now have Grouped Datasets twice and Subclass Detection twice in main, there were some conflict between the new and the old versions. Will push an update soon.

dilyabareeva / quanda

WiP: Datasets reworked #61