Advanced prompt management for LLMs / Prompt Hub

julian-risch commented 1 year ago

Problem Statement As an advanced user of LLMs(*1), I need some tooling to manage my prompts and few-shot examples, so that I can find the best one in a fast and structured way. I also want to try out off-the-shelf prompts made by others without spending time on engineering prompts myself.

(1) used LLMs before, used OpenAI’s API a lot, experimented with various prompts, is building a real app

User tasks

🟢 Save a prompt
🟢 Share a prompt with others (uploading)
🔴 Load prompt that was created by another user into a pipeline
🔴 Edit the loaded prompt
🔴 Compare different prompts in an experiment

Requirements

Update: implemented in https://github.com/deepset-ai/prompthub

[x] We imagine a public hub storage of prompt templates together with meta data in YAML format in a public GitHub repository.
[x] It needs to be easy to mirror this repository locally, e.g., in an air-gapped setup.
[x] We need a simple way of indexing all the prompt templates, for example, an index file in the root directory of the repository.
[x] When users add a new prompt template to the repository, we also need to update the index.
[x] All prompt templates should have a version, so that we can specify also a specific version when we load a prompt template. Latest version by default.
[ ] We should be able to infer the correct model from the PromptModel in the PromptNode. If not there should be a default.
[ ] Additional meta data that we might want to store for each template are input and output example pairs.

It's important to anticipate how this storage of prompt templates will be queried:

[x] By id ("Give me the prompt templates with id deepset-ai/calculator")
[ ] By model ("Give me all the prompt templates for daVinci GPT3")
[ ] By keywords ("Give me all QA related prompts")

### Related Issues
- [ ] https://github.com/deepset-ai/prompthub-ui/issues/5
- [ ] https://github.com/deepset-ai/prompthub/issues/7
- [ ] https://github.com/deepset-ai/prompthub/issues/8
- [ ] https://github.com/deepset-ai/haystack/issues/3867
- [ ] https://github.com/deepset-ai/haystack/issues/4807
- [ ] https://github.com/deepset-ai/haystack/issues/4710
- [ ] https://github.com/deepset-ai/haystack/issues/4883

danielbichuetti commented 1 year ago

Hi @julian-risch

What are your thoughts about the sharing mechanisms?

Would it be one of your ideas to allow S3-compatible storage (Minio, S3, etc.) to save prompts, upload, save, and share links? Which other storage method are you thinking about? Maybe git?

julian-risch commented 1 year ago

Hi @danielbichuetti we haven't decided how to enable sharing prompts but we definitely see the need. We are looking for a simple, intuitive and pragmatic solution with minimum overhead for the users. One very pragmatic workaround solution we saw was to handle prompts as datasets, for example here: https://huggingface.co/datasets/fka/awesome-chatgpt-prompts This could be a simple yet powerful enough solution for the start. However, I could easily imagine some features that cannot be supported with this workaround. We will discuss the topic later in this quarter. If you have some ideas, let us know. With git and S3 the question would be how users can lookup and find these prompts. Were you thinking about one public GitHub repository for all? @vblagoje will probably also be involved in the discussion. 🙂

danielbichuetti commented 1 year ago

@julian-risch When I first read this issue, I started to brainstorm.

First, I imagined S3-compatible storage (biased option, as we use a similar approach for internal company systems). Using S3-compatible would allow a fine-grained solution for any enterprise, which usually have one of the different S3-compatible services. Saving is fast, search can be using meta and it's extremely efficient. It would allow pointing to a public-S3 repo, or the user could set up its endpoint (private or external public repo). This is not hard to implement. There are straightforward libraries for it.

Git is a de facto standard also, allows versioning, free and paid options. The only issue comes for enterprises, where it's impossible to implement fine-grained control at repo level (of course other layers could implement it). I imagined a public git repo and allowing users to point to external private/public repos.

Using HF datasets is also a possibility; indeed, I didn't imagine at first it 🥲. It would allow versioning, diffs, and it will have a nice exposure in the HF Hub. Furthermore, it allows access to a lot of storage providers. Great idea!

Were you considering adding methods/parameters to the PromptTemplate? E.g. save load dataset= search And maybe PromptNode saving which PromptTemplate its results were based on?

julian-risch commented 1 year ago

@danielbichuetti Thank you so much for sharing the info on S3!

We imagine save and load methods as you describe yes. Loading could work with specifying the name of the prompt and its source, e.g. load "Travel Guide" from datasets/fka/awesome-chatgpt-prompts. As a user, I could imagine that I would like to save not only the prompt name and the prompt string but in addition also an example with query and returned result for a given model. Such a feature could also improve searching for prompts.

And maybe PromptNode saving which PromptTemplate its results were based on? Not sure about this one. In the pipeline configuration that information is stored already. In addition, the debug output of the pipeline results could include that info too.

danielbichuetti commented 1 year ago

@julian-risch

As a user, I could imagine that I would like to save not only the prompt name and the prompt string but in addition also an example with query and returned result for a given model. Such a feature could also improve searching for prompts.

I got it. When you mentioned search, you were not referring to any node built-in method to search, but the general capability to search, e.g., user visually or programmatically searching outside the framework.

Not sure about this one. In the pipeline configuration that information is stored already. In addition, the debug output of the pipeline results could include that info too.

My first thought was to save "results" + "instructions" for later evaluation. Indeed, it could be manually implemented by the user if it's his desire.

vblagoje commented 1 year ago

Great discussion @danielbichuetti @Julian Risch - I think using HF datasets ticks off many requirements. I especially like it because we can piggyback on an excellent and established ecosystem. And it fits our objectives in many aspects. Here is how:

Use HF dataset splits as fined-tuned per model prompt collections. We'll need collections of prompts grouped by supported model. More complex prompt templates will likely vary slightly per model, and it would be great to isolate users from these details. If users select topic-classification-with-options why should they care about the exact prompting text? They want to select the model, the template (task), and be sure that this is the best-performing prompt template for that particular task and the model. Behind the scenes in PromptNode we'll pick the prompt from the particular split matching the model
Transparent updates. Another great feature of HF datasets is that we can, behind the scenes, version/maintain/update these per-task templates, and users would pick updates transparently (if they want). No coding changes are needed.
Versioning - HF datasets support versioning out of the box
HF datasets already have great UI viewers so that anyone can inspect them.

danielbichuetti commented 1 year ago

Hi @vblagoje

Indeed, it's a great option! As I mentioned earlier, my first post was biased because of our internal usages, and I was unaware of the HF dataset capability to be stored in so many cloud storage services.

You are absolutely right. The only concern I had was when enterprises wanted to store data in a private environment, but that could be achieved by storing the data in their cloud, while keeping the usage of the dataset.

Implementing anything other than a dataset would be like creating a car to travel to the next city. The 'Ferrari' already exists.

Will this be implemented in the short time? Are there any drafts that are welcome, or is this already assigned?

vblagoje commented 1 year ago

Hey @danielbichuetti, it has not been assigned yet but I think it will be assigned in the next sprint. Are you interested in contributing? I can't promise anything, let me first talk to my team colleagues and my manager.

ZanSara commented 1 year ago

Closing as completed! 🎉

Haystack 1.18 integrates the PromptHub (https://prompthub.deepset.ai/) for PromptNode and Agent prompts, see https://github.com/deepset-ai/haystack/pull/4879

deepset-ai / haystack

Advanced prompt management for LLMs / Prompt Hub #3761

Requirements