I know hardcoding it came from me but while Gradient Checkpointing makes things faster and use less VRAM so very useful on some use-cases, but can break things on A100 and also break cutn_batches on most text-to-image implementations, so ideally it should be optional for the user
More broadly we should think on how to load options that pertain to particular loaders/modules/perceptors while not breaking the overall mocking logics
I know hardcoding it came from me but while Gradient Checkpointing makes things faster and use less VRAM so very useful on some use-cases, but can break things on A100 and also break
cutn_batches
on most text-to-image implementations, so ideally it should be optional for the userMore broadly we should think on how to load options that pertain to particular loaders/modules/perceptors while not breaking the overall mocking logics