AOSSIE-Org / EduAid

A tool that can auto-generate short quizzes on the basis of the content provided.
18 stars 43 forks source link

Enhancement Report 📈: Hugging Face Lightning Fast Trainer #63

Open Tuhinm2002 opened 2 days ago

Tuhinm2002 commented 2 days ago

Screenshot 2024-11-05 225204

Summary

This enhancement introduces a Hugging Face Lightning Fast Trainer—a streamlined, highly customizable training module designed as an alternative to the default PyTorch Trainer. Built with user-centric customization in mind, this trainer aims to offer flexibility in configuring and executing ML workflows, while maintaining robust performance.

Background

The PyTorch Trainer provided by Hugging Face is an efficient solution for most training requirements. However, it often restricts advanced users seeking finer control over the training process. The Lightning Fast Trainer addresses this limitation by introducing modularity and extensive configuration options, enabling users to tailor every phase of the training pipeline to their needs.

Goals

  1. Enhanced Customization: Provide modular and parameterized control over training loops, optimizers, schedulers, and evaluation strategies.
  2. Increased Efficiency: Leverage optimized backend operations to deliver a performance boost, reducing training time while maintaining accuracy.
  3. Ease of Use: Maintain a user-friendly interface, ensuring compatibility with standard Hugging Face models and datasets.
  4. Integration with Hugging Face Ecosystem: Seamless integration with Hugging Face's existing model, tokenizer, and dataset libraries.

Key Features

Tuhinm2002 commented 2 days ago

@Aditya062003 hey I added new training style which is efficient and ideal for hf models. This hf trainer also provides an interactive way to tweak the parameters. here is the PR for that #64 . Until then cheers mate

Tuhinm2002 commented 2 days ago

and btw if you find difficult see the changes as it is a colab notebook feel free to visit the link and run by yourself @Aditya062003 https://colab.research.google.com/drive/1e_TWIZ4YqPGygu4igjebPfRn9dB3BC6V#scrollTo=mWESNZLVOrcj

Aditya062003 commented 2 days ago

Hey @Tuhinm2002 , actually Krishna updated both the models and pipeline recently. @Roaster05 please look into this.

Tuhinm2002 commented 2 days ago

Hey @Tuhinm2002 , actually Krishna updated both the models and pipeline recently. @Roaster05 please look into this.

Yeah I know but he used simple pytorch trainer and instead of that i used efficient hf trainer class which makes it easier and more customizable.

you can compare the code here https://github.com/AOSSIE-Org/EduAid/blob/main/Model_training/KeyPhrase%20Detection/keyphrase-detection-T5.ipynb

Tuhinm2002 commented 2 days ago

@Aditya062003 just check the training process at the end of the notebook. cheers mate

Tuhinm2002 commented 1 day ago

hey @Aditya062003 @Roaster05 any update for the improvements I suggested.

Roaster05 commented 17 hours ago

Hey @Tuhinm2002 ,

The PyTorch trainer is currently not being used in the pipeline. It was initially implemented by @prarabdhshukla in his version, but we will be removing it from the codebase to avoid any further confusion.

Tuhinm2002 commented 15 hours ago

Hey @Tuhinm2002 ,

The PyTorch trainer is currently not being used in the pipeline. It was initially implemented by @prarabdhshukla in his version, but we will be removing it from the codebase to avoid any further confusion.

Thanks for clarification 😉👍