Lightning-Universe / lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning
https://lightning-transformers.readthedocs.io
Apache License 2.0
610 stars 77 forks source link

Sparseml Integration #196

Closed mathemusician closed 2 years ago

mathemusician commented 2 years ago

🚀 Feature

Add option to use sparseml. Example implementation found here: Google Colab link

Motivation

There is currently no option to use pruning techniques from sparseml out-of-the-box

Pitch

I will make a pull request to add this option to the hydra config. I've already forked a version of the lightning-tranformers library. link

Here's how it will be added on the hydra CLI:

trainer=sparseml

Passing this into the trainer means it will automatically use ddp. How convenient! Sparseml also uses a special conversion to log weights and such, and I've also implemented this:

+trainer/logger=sparsewandb

It is also available as a callback for those who want to train with CPU.

+trainer/callback=sparseml

Sparseml barely supports transformers at the moment, so I've had to make a workaround for their exporter. BERT and other BERT-like models output a ModelOutput which tell the exporter there will be two outputs. But sometimes, there's only one. I've just forced the exporter to treat is all as one output for now. I may open a pull request at sparseml to handle transformer outputs.

The RECIPE_PATH and MODELS_PATH, paths to the recipe yaml and models folder, are passed in as environment variables. I wasn't able to find a way to around this since hydra overwrites added configs after starting the training loop. Maybe there's a better way of doing this.

Alternatives

I haven't thought much about this, but I'll add in a few good alternatives once I find some.

Additional context

This is my first time to make a pull request to a rather large library, so don't be afraid to critique. I need the feedback. I may also need help understanding how to get the "fit" stage works differently from the "train" stage. I'm running training.run_test_after_fit=False becuase the fit stage doesn't work. Training works just fine, however.

SeanNaren commented 2 years ago

This is really cool! Looking forward to the PR :)

Regarding the env variables, it could be possible to make these arguments passed to the callback upon instantiation. I can have a look at making this a possibility once your PR is up!

We've also recently contributed a sparseml callback to the lightning-bolts package, which might also be useful for this: https://lightning-bolts.readthedocs.io/en/latest/callbacks/sparseml.html

mathemusician commented 2 years ago

@SeanNaren I'm actually using a variant of your sparseml callback! It's what inspired this. I'll submit a PR after I get more feedback from the neuralmagic community. They're usually pretty quick at responding.

SeanNaren commented 2 years ago

epic!! keep me updated :) more than happy to collab on this

mathemusician commented 2 years ago

Made the PR. The standard operating procedure is to close the issue after the PR is made, right?

SeanNaren commented 2 years ago

Just linked it to the PR, so when the PR is merged, this will close :)