ImplementedModelPatcher, our novel solution to introducing fused-ops and kernels without explicit rewrite of modeling functions
this diverts from unsloth that explicitly rewrites the model files.
ModelPatcher's design is based on rule-based patching, which makes handling different models easy. All the model forwards that are being patched are tracked.
ModelPatcher also performs rule-based patching of modeling code that is more efficient sometimes as patching a new forward. For example, to replace the CrossEntropyLoss, we do not want to rewrite a new forward just to change the loss. ModelPatcher can patch this using an intelligent use of importlib.
This PR only contains ModelPatcher' rules for llama and mistral.
This PR only supports only supports auto_gptq
TODO:
[x] license notices
[x] linting
[x] some initial unit tests.
[x] add MLP fused-ops - addressed in #29
[x] add in mixtral specific rules - addressed in #29
[x] fix FSDP casting issues introduced by #15 - addressed in #28
fix formatting in configs generated by generate_sample_configurations.py
update benchmarks (maybe do in diff PR)
[x] support BNB QLoRA in later PR - addressed in #29
ModelPatcher rules for patching generic models (maybe defer to later)
This is an initial addition of the FusedOps and Kernels Plugin
ModelPatcher
, our novel solution to introducing fused-ops and kernels without explicit rewrite of modeling functionsModelPatcher
's design is based on rule-based patching, which makes handling different models easy. All the model forwards that are being patched are tracked.ModelPatcher
also performs rule-based patching of modeling code that is more efficient sometimes as patching a new forward. For example, to replace theCrossEntropyLoss
, we do not want to rewrite a new forward just to change the loss.ModelPatcher
can patch this using an intelligent use ofimportlib
.ModelPatcher
' rules forllama
andmistral
.auto_gptq
TODO:
generate_sample_configurations.py
ModelPatcher
rules for patching generic models (maybe defer to later)Initial Tests on L40