Closed julian-risch closed 1 year ago
Hi @julian-risch
What are your thoughts about the sharing mechanisms?
Would it be one of your ideas to allow S3-compatible storage (Minio, S3, etc.) to save prompts, upload, save, and share links? Which other storage method are you thinking about? Maybe git?
Hi @danielbichuetti we haven't decided how to enable sharing prompts but we definitely see the need. We are looking for a simple, intuitive and pragmatic solution with minimum overhead for the users. One very pragmatic workaround solution we saw was to handle prompts as datasets, for example here: https://huggingface.co/datasets/fka/awesome-chatgpt-prompts This could be a simple yet powerful enough solution for the start. However, I could easily imagine some features that cannot be supported with this workaround. We will discuss the topic later in this quarter. If you have some ideas, let us know. With git and S3 the question would be how users can lookup and find these prompts. Were you thinking about one public GitHub repository for all? @vblagoje will probably also be involved in the discussion. 🙂
@julian-risch When I first read this issue, I started to brainstorm.
First, I imagined S3-compatible storage (biased option, as we use a similar approach for internal company systems). Using S3-compatible would allow a fine-grained solution for any enterprise, which usually have one of the different S3-compatible services. Saving is fast, search can be using meta and it's extremely efficient. It would allow pointing to a public-S3 repo, or the user could set up its endpoint (private or external public repo). This is not hard to implement. There are straightforward libraries for it.
Git is a de facto standard also, allows versioning, free and paid options. The only issue comes for enterprises, where it's impossible to implement fine-grained control at repo level (of course other layers could implement it). I imagined a public git repo and allowing users to point to external private/public repos.
Using HF datasets is also a possibility; indeed, I didn't imagine at first it 🥲. It would allow versioning, diffs, and it will have a nice exposure in the HF Hub. Furthermore, it allows access to a lot of storage providers. Great idea!
Were you considering adding methods/parameters to the PromptTemplate? E.g. save
load
dataset=
search
And maybe PromptNode saving which PromptTemplate its results were based on?
@danielbichuetti Thank you so much for sharing the info on S3!
We imagine save
and load
methods as you describe yes. Loading could work with specifying the name of the prompt and its source, e.g. load "Travel Guide"
from datasets/fka/awesome-chatgpt-prompts
. As a user, I could imagine that I would like to save not only the prompt name and the prompt string but in addition also an example with query and returned result for a given model. Such a feature could also improve searching for prompts.
And maybe PromptNode saving which PromptTemplate its results were based on? Not sure about this one. In the pipeline configuration that information is stored already. In addition, the debug output of the pipeline results could include that info too.
@julian-risch
As a user, I could imagine that I would like to save not only the prompt name and the prompt string but in addition also an example with query and returned result for a given model. Such a feature could also improve searching for prompts.
I got it. When you mentioned search, you were not referring to any node built-in method to search, but the general capability to search, e.g., user visually or programmatically searching outside the framework.
Not sure about this one. In the pipeline configuration that information is stored already. In addition, the debug output of the pipeline results could include that info too.
My first thought was to save "results" + "instructions" for later evaluation. Indeed, it could be manually implemented by the user if it's his desire.
Great discussion @danielbichuetti @Julian Risch - I think using HF datasets ticks off many requirements. I especially like it because we can piggyback on an excellent and established ecosystem. And it fits our objectives in many aspects. Here is how:
topic-classification-with-options
why should they care about the exact prompting text? They want to select the model, the template (task), and be sure that this is the best-performing prompt template for that particular task and the model. Behind the scenes in PromptNode we'll pick the prompt from the particular split matching the modelHi @vblagoje
Indeed, it's a great option! As I mentioned earlier, my first post was biased because of our internal usages, and I was unaware of the HF dataset capability to be stored in so many cloud storage services.
You are absolutely right. The only concern I had was when enterprises wanted to store data in a private environment, but that could be achieved by storing the data in their cloud, while keeping the usage of the dataset.
Implementing anything other than a dataset would be like creating a car to travel to the next city. The 'Ferrari' already exists.
Will this be implemented in the short time? Are there any drafts that are welcome, or is this already assigned?
Hey @danielbichuetti, it has not been assigned yet but I think it will be assigned in the next sprint. Are you interested in contributing? I can't promise anything, let me first talk to my team colleagues and my manager.
Closing as completed! 🎉
Haystack 1.18 integrates the PromptHub (https://prompthub.deepset.ai/) for PromptNode and Agent prompts, see https://github.com/deepset-ai/haystack/pull/4879
Problem Statement As an advanced user of LLMs(*1), I need some tooling to manage my prompts and few-shot examples, so that I can find the best one in a fast and structured way. I also want to try out off-the-shelf prompts made by others without spending time on engineering prompts myself.
(1) used LLMs before, used OpenAI’s API a lot, experimented with various prompts, is building a real app
User tasks
Requirements
Update: implemented in https://github.com/deepset-ai/prompthub
It's important to anticipate how this storage of prompt templates will be queried: