EleutherAI / project-menu

See the issue board for the current status of active and prospective projects!
65 stars 4 forks source link

[Project] Better Concept Pointers #18

Closed AI-WAIFU closed 1 year ago

AI-WAIFU commented 3 years ago

Project: Better Concept Pointers

Elevator Pitch

There's an argument that there's a correspondence between the pointers problem, Goodhart's law and adversarial examples.[1] It stands to reason that progress on the latter can be used to develop techniques for dealing with the latter.

Goal Outputs

The goal is to develop a technique for robustly generating "real" adversarial examples. Image perturbations that change a classifier's output, and how a human would label the output.

Milestones

[1] Waifu et al. when he get around to doing a write up.

cfoster0 commented 2 years ago

This new work may have some relevance: https://arxiv.org/abs/2204.12301

We design new visual illusions by finding "adversarial examples" for principled models of human perception -- specifically, for probabilistic models, which treat vision as Bayesian inference. To perform this search efficiently, we design a differentiable probabilistic programming language, whose API exposes MCMC inference as a first-class differentiable function. We demonstrate our method by automatically creating illusions for three features of human vision: color constancy, size constancy, and face perception.