Refactor `modal_tensor_acceleration`

I currently use a YAML value called LARGE_MODEL_MODE to deal with accelerate's shortcomings with small model devicing. I am unsure why accelerate falls short here, but I know that manually changing devices on the (prepared) tensor fixes the issue.

It's annoying on an end user to have to toggle this boolean to run the interpretability pipeline, though. So, I am currently thinking of adding a try/except to modal_tensor_acceleration that defaults to ordinary tensor preparation, and falls back on the additional manual device movement. My only worry is that this strategy might be slow, but I suspect that inference time is overwhelmingly the bottleneck, and that (for the few small models we'd interpret) the slowdown will be acceptable.

If the slowdown proves too bad for that approach to be viable, I'd want to try out devicing on the model in question once, then cache that result boolean. Then, later acceleration could look up the boolean and respond appropriately. I'd like to avoid this complexity if possible, though.

DavidUdell / sparse_circuit_discovery

Refactor `modal_tensor_acceleration` #4