deadbits / vigil-llm

⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs
https://vigil.deadbits.ai/
Apache License 2.0
270 stars 32 forks source link

Better data loader #57

Closed deadbits closed 7 months ago

deadbits commented 7 months ago

I didn't realize how awful and broken the dataset loading process was. Hopefully this PR is a big improvement.

The utils directory has been removed and all dataset loading is now handled by loader.py. Users pass a hugging face repo and Vigil config file, and everything else is handled. No more cloning the repos and using that parquet loader.

(venv) adam:vigil-llm/ (dataloader✗) $ python loader.py --help                                                                                                        [0:13:19]
usage: loader.py [-h] -d DATASET -c CONFIG

Load text embedding data into Vigil

options:
  -h, --help            show this help message and exit
  -d DATASET, --dataset DATASET
                        dataset repo name
  -c CONFIG, --config CONFIG
                        config file