MeZO implementation with fabric

Description & Motivation

MeZO proposes a memory-efficient zeroth-order optimizer (MeZO), adapting the classical zeroth-order SGD method to operate in-place, thereby fine-tuning language models (LMs) with the same memory footprint as inference.

An implementation of MeZO based on HuggingFace's Trainer has been made accessible in the MeZO repository

I believe it would be a valuable enhancement to incorporate MeZO support into Lightning, seamlessly integrated with Fabric. This would provide users with the added advantage of leveraging MeZO's capabilities alongside the existing functionalities of Lightning and Fabric, resulting in a more comprehensive and efficient solution.

Pitch

Integrating MeZO support into Lightning with Fabric offers a enhancement to the platform. By incorporating MeZO's memory-efficient zeroth-order optimizer, users can fine-tune language models with minimal memory footprint. This integration empowers users to optimize language models effectively while maintaining high efficiency. Unlocking the power of MeZO within Lightning and Fabric opens up new possibilities for optimizing and scaling language models, resulting in improved performance and resource utilization.

Alternatives

No response

Additional context

Repository - https://github.com/princeton-nlp/MeZO

Paper - https://arxiv.org/pdf/2305.17333.pdf

cc @borda

Lightning-AI / pytorch-lightning