DavidUdell / sparse_circuit_discovery

Circuit discovery in GPT-2 small, using sparse autoencoding
MIT License
6 stars 1 forks source link

bleach.clean() is too aggressively sanitizing #64

Closed DavidUdell closed 6 months ago

DavidUdell commented 6 months ago

Tokens like new, for, and, and , are all being stripped out. This is my new theory of the weird short-input-contexts features.

DavidUdell commented 6 months ago

Tried moving back to html.escape(), for now. Let's see if this works reliably.