Gradient Checkpointing for OpenCLIP should be optional

I know hardcoding it came from me but while Gradient Checkpointing makes things faster and use less VRAM so very useful on some use-cases, but can break things on A100 and also break cutn_batches on most text-to-image implementations, so ideally it should be optional for the user

More broadly we should think on how to load options that pertain to particular loaders/modules/perceptors while not breaking the overall mocking logics

dmarx / Multi-Modal-Comparators

Gradient Checkpointing for OpenCLIP should be optional #36