deadbits / vigil-llm

⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs
https://vigil.deadbits.ai/
Apache License 2.0
270 stars 32 forks source link

Improve dataset loading #55

Closed deadbits closed 7 months ago

deadbits commented 7 months ago

It should be easier to load the initial embedding datasets into a Vigil install. Using the huggingface datasets library will work. One function to download and load the datasets into chroma, optionally save the dataset to disk.

This will avoid the git clone and parquet2vdb steps entirely. The main app could even check if it’s the first run and load the default datasets if so (or some similar workflow.. whatever makes sense). Users can then use the same function to load new datasets from HF.

While I’m at it, I should allow loading datasets with user-defined column names. Right now the loader is looking for a specific format, but this could be more flexible.

deadbits commented 7 months ago

Done via loader.py https://github.com/deadbits/vigil-llm/pull/57 for https://github.com/deadbits/vigil-llm/releases/tag/v0.9.6-alpha