AutoTuneNLP
A comprehensive toolkit for seamless data generation and fine-tuning of NLP models, all conveniently packed into a single block.
Setup
Clone the repo and cd to project root.
Environment
This projects works with Python 3.10
Create the virtual environment
python3.10 -m venv venv
Activate the virtual environment
source venv/bin/activate
.\venv\Scripts\activate
Install all dependancies
pip install poetry
poetry install
Local Development
-
Start your docker engine.
-
Copy the sample env and populate the fields
cp sample.env .env
- Start the redis and postgres containers
docker compose up -d redis postgres
NOTE: activate the virtual environment before starting the django server and celery worker
- Start the django server in a terminal window.
python manage.py runserver port
- Start celery worker using gevent pool in another terminal window.
celery -A autotune worker --loglevel=info -P gevent
- If you are running on windows, the above command won't work since celery is not supported on windows, but you can use the below command for testing (caveat: it's capabilities are lost).
celery -A autotune worker --loglevel=info --pool=solo
For the API specification, refer to the API SPECIFICATION
Contributing
Interested in contributing to AutoTune? We'd love your help! Check out our issues section for areas where you can contribute. Please see our contribution guide for more details on how to get involved.
Typical Workflow
- User is shown a login page to login using their Google account. (The account has to be of 'samagragovernance.in' domain). The user is then shown a settings page where they are nudged to update the API keys for OpenAI and HuggingFace. The user is also shown a list of all the repos they have access to on HuggingFace. The user can select the repo they want to work on and the settings are saved. The
settings
tab allows the user to view and update the settings.
- User gives a prompt and samples are 5 generated automatically.
- User can view the 5 generated samples and select the ones they like and dislike.
- The prompt is updated with those examples and the process is repeated until the user is satisfied with the 5 samples.
- Once the user is satified, user provides the number of samples they want and the data is generated. The process is async and progress is shown to the user. The progress is tracked every 2 seconds.
- The data is generated and shown to the user, they give a go ahead and the dataset is pushed to huggingface. The
dataset
tab allows the user to view all the datasets that they have deployed until now on huggingface.
- A link is shared with the user so that they can view the data on huggingface.
- There is a tab called train, which allows user to use the dataset created earlier to train a model by filling a form. The process is async and progress is shown to the user. The progress is tracked every 2 seconds.
- Once trained the user is allowed to view the results of the model on the test set. The user can also view the results of the model on the validation set. The
models
tab allows the user to view all the models that they have deployed until now.
- The user is then nudged to deploy the model to huggingface. Once confirmed, the user is asked to provide a name for the model and the model is pushed to huggingface.
- A link is shared with the user so that they can view the model on huggingface and a curl is shared so that they can use the model for inference.
- The
history
tab allows the user to view all the tasks that they have performed until now.
License
AutoTune is made available under the MIT License. See the LICENSE file for more info.
Thank you for considering AutoTune for your machine learning and dataset creation needs. We're excited to see the innovative solutions you'll build with our platform!