Closed mathemusician closed 2 years ago
This is really cool! Looking forward to the PR :)
Regarding the env variables, it could be possible to make these arguments passed to the callback upon instantiation. I can have a look at making this a possibility once your PR is up!
We've also recently contributed a sparseml callback to the lightning-bolts package, which might also be useful for this: https://lightning-bolts.readthedocs.io/en/latest/callbacks/sparseml.html
@SeanNaren I'm actually using a variant of your sparseml callback! It's what inspired this. I'll submit a PR after I get more feedback from the neuralmagic community. They're usually pretty quick at responding.
epic!! keep me updated :) more than happy to collab on this
Made the PR. The standard operating procedure is to close the issue after the PR is made, right?
Just linked it to the PR, so when the PR is merged, this will close :)
🚀 Feature
Add option to use sparseml. Example implementation found here: Google Colab link
Motivation
There is currently no option to use pruning techniques from sparseml out-of-the-box
Pitch
I will make a pull request to add this option to the hydra config. I've already forked a version of the lightning-tranformers library. link
Here's how it will be added on the hydra CLI:
Passing this into the trainer means it will automatically use ddp. How convenient! Sparseml also uses a special conversion to log weights and such, and I've also implemented this:
It is also available as a callback for those who want to train with CPU.
Sparseml barely supports transformers at the moment, so I've had to make a workaround for their exporter. BERT and other BERT-like models output a ModelOutput which tell the exporter there will be two outputs. But sometimes, there's only one. I've just forced the exporter to treat is all as one output for now. I may open a pull request at sparseml to handle transformer outputs.
The
RECIPE_PATH
andMODELS_PATH
, paths to the recipe yaml and models folder, are passed in as environment variables. I wasn't able to find a way to around this since hydra overwrites added configs after starting the training loop. Maybe there's a better way of doing this.Alternatives
I haven't thought much about this, but I'll add in a few good alternatives once I find some.
Additional context
This is my first time to make a pull request to a rather large library, so don't be afraid to critique. I need the feedback. I may also need help understanding how to get the "fit" stage works differently from the "train" stage. I'm running
training.run_test_after_fit=False
becuase the fit stage doesn't work. Training works just fine, however.