bleach.clean() is too aggressively sanitizing

DavidUdell / sparse_circuit_discovery

Circuit discovery in GPT-2 small, using sparse autoencoding

MIT License

6 stars 1 forks source link

Closed DavidUdell closed 6 months ago

DavidUdell commented 6 months ago

Tokens like new, for, and, and , are all being stripped out. This is my new theory of the weird short-input-contexts features.

DavidUdell commented 6 months ago

Tried moving back to html.escape(), for now. Let's see if this works reliably.