baal-org / baal

Bayesian active learning library for research and industrial usecases.
https://baal.readthedocs.io
Apache License 2.0
854 stars 84 forks source link

Baal in Production Notebook | Classification | NLP | Hugging Face #245

Closed nitish1295 closed 1 year ago

nitish1295 commented 1 year ago

Summary:

This is a demo/tutorial to use active learning with hugging face models in a production setting. Kindly find more about this at in the discussion at https://github.com/baal-org/baal/discussions/242

Features:

NA

Checklist:

Given that this is a notebook and I am not setting up any new modules there are no test cases. There is some pending type hinting pending which I will complete.

Opening a PR for your feedback, just to check if you want me to add/remove somethings

Additional Info

Challenges with current GPU

Seems like the pytorch version which baal uses does not support my current GPU. Although I have tested this on Colab and it works fine.

NVIDIA GeForce RTX 3050 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3050 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))

More info about this on the pytorch forum in case someone runs into a similar issue

import torch

torch.__version__

1.12.1+cu102

torch.cuda.get_arch_list()

['sm_37', 'sm_50', 'sm_60', 'sm_70']

Although I have tested this on Colab and it works fine.

Challenges with Black Formatting

You might want to update your black version to black==22.3.0.

The make format command produces an error which is identical to the one mentioned at stack overflow here.

I have encountered this before and an upgrade does fix it

Dref360 commented 1 year ago

Awesome!

I'll update torch/black in a PR separately.

My only comment would be around the csvs. Could we load the dataset directly from HuggingFace? load_datasets('tweet_eval", "emotion")?

Also if you can, could you add the new notebook to the documentation in mkdocs.yml? Maybe make a new subsection to hold all tutorials for production.

Very minor comments! Thank you very much.

nitish1295 commented 1 year ago

My only comment would be around the csvs. Could we load the dataset directly from HuggingFace? load_datasets('tweet_eval", "emotion")?

I had expected this but I had deliberately done this to "mimic" a setting where we do not load data directly via Hugging Face, but I guess people can do this on their own based on their requirements. Will update this.

Yes will do the docs thing