Research directions. - Githubissues

alvitawa commented 2 years ago

Current Ideas to go beyond recreation

Different search algorithm Attempt to find augmentation sets with better performance by, for example, using a smarter search algorithm (paper uses random search) or simply increasing the allowed number of consecutive augmentations.
Different attack method Incorporate information about transformations used in attack. Attempt to break the current augmentations with a smarter attack model (but the paper claims to have tested all exisiting reconstruction attacks). We can also test if the method can be used to secure a different aspect such as membership inference. Also, is the method still secure if the augmentation set is known by the attacker? There is also the adaptive attack from the paper discussion, which starts the reconstruction from a non-random image.
Different datasets Test the method on different datasets.
Different Evaluation The paper uses unaugmented data as baseline. Instead, the data could be augmented in a way to optimize for accuracy. This would result in a more fair comparison since the method also partially does this.
Different Policies The paper only shows results using policies consisting of 3 augmentations. Can better results be achieved when limiting to simpler sets?

alvitawa commented 2 years ago

Also a potentially interesting research direction: https://www.youtube.com/watch?v=86ib0sfdFtw&t=5262s Maybe this interpretation can give some insight into how and why these augmentations secure privacy.

ole2252 commented 2 years ago

2. Attempt to break the current augmentations with a smarter attack model (but the paper claims to have tested all exisiting reconstruction attacks).

In the discussion it is said that a new attack strategy may be developed by guessing the content property or class representatives of the target sample, instead of randomly initiating an image. We may look further into this attack strategy.

Linkerbrain commented 2 years ago

Something I found interesting was how in some of the testing scenarios the accuracy increased when using the transformed data instead of the actual data. For instance, the Accuracy with No Policy for CIFAR100 with RESNET20 was 76.88, while the hybrid transformed data scored 77.92.

The paper explains this by stating that it might generalize better when using the transformed data.

But this might make the evaluation unfair, instead of comparing the accuracy of the transformed data to the base data, it might be better to compare the accuracy of the transformed data to a concatenation of the base and transformed data

alvitawa commented 2 years ago

Something I found interesting was how in some of the testing scenarios the accuracy increased when using the transformed data instead of the actual data. For instance, the Accuracy with No Policy for CIFAR100 with RESNET20 was 76.88, while the hybrid transformed data scored 77.92.

The paper explains this by stating that it might generalize better when using the transformed data.

But this might make the evaluation unfair, instead of comparing the accuracy of the transformed data to the base data, it might be better to compare the accuracy of the transformed data to a concatenation of the base and transformed data

So compare optimizing for accuracy with transforms vs optimizing for privacy.

alvitawa commented 2 years ago

Test whether the method can be simplified, for example, is one augmentation enough?

Linkerbrain commented 2 years ago

I edited the original post to incorporate the new ideas yo

Linkerbrain / fact-ai-2021-project

Research directions. #1