Project: Better Concept Pointers

Elevator Pitch

There's an argument that there's a correspondence between the pointers problem, Goodhart's law and adversarial examples.[1] It stands to reason that progress on the latter can be used to develop techniques for dealing with the latter.

Goal Outputs

The goal is to develop a technique for robustly generating "real" adversarial examples. Image perturbations that change a classifier's output, and how a human would label the output.

Milestones

Train a fully Bayesian classifier.
Analyze the properties of adversarial examples that fool this classifier and see if they reflect what's here.
Collect human feedback(classifications) on adversarial perturbations.
Measure the effect of human feedback on the quality of adversarial examples.
If all of the above fails, explore methods of regularizing the optimizer to make it better and finding the kind of perturbations we're looking for.
How to Help: There are a few things that can be done in parallel.
Train the classifier
Implement a method for discovering adversarial examples
Build a UI for collecting the human feedback or figure out how to use mechanical turk to do it.
Desired Support: Desired resources are:
- TPU compute to train a proper Bayesian classifier
- Money for mechanical turk expenses (if necessary)
- Hosting for the web UI (if necessary)

[1] Waifu et al. when he get around to doing a write up.

EleutherAI / project-menu

[Project] Better Concept Pointers #18

Project: Better Concept Pointers

Elevator Pitch

Goal Outputs

Milestones

How to Help: There are a few things that can be done in parallel.

Desired Support: Desired resources are: