Open shobhnadhami opened 7 months ago
Hi, Few remarks:
For example:
--task_description:
Assistant is a large language model that is tasked to generate SQL query based on details and examples provided in prompt.
We have 2 tables:
Employee: Employee table have information regarding all the employees in a company.
Below are the attributes of Employee table
empid: empid column contains employee id. empid is a primary key of Employee table.
name: Name column contains name of the employee
salary: salary column contains salary of the employee
department_id: department_id column contains employee's department id. It is a foreign key from Department table.
Department: Department table have information regarding all the department of a company.
Below are the attributes of Department table
department_id: department_id contains the id of the department. department_id is primary key of Department table.
department_name: department_name contains name of the department.
--prompt
Your task is to generate SQL query from natural language input provided by user.
Your task is to understand natural language input and provide SQL query to fetch information asked in natural language input from above tables.
Lastly, it's important to note that you are using LLM ranker, so you need to skip the ranker training process.
You should change this line:
https://github.com/Eladlev/AutoPrompt/blob/7f373f219aa360cd2de38c6aa700c1dff282d7de/run_generation_pipeline.py#L53
To:
generation_config_params.eval.function_params.instruction = ranker_config_params.annotator.config.instruction
Thank you for your reply. You mentioned above that ranking training process needs to be skipped as llm ranker is used.
Could you please elaborate why ranker training is not needed. I don’t get it.
Regards, Shobhna
On Thu, 4 Apr 2024 at 1:48 AM, Eladlev @.***> wrote:
Hi, Currently, AutoPrompt doesn't support such complex prompts. The way to do it is to decompose this complex prompt into smaller prompts with specific sub-tasks and optimize them. I recommend you look at the dspy https://github.com/stanfordnlp/dspy/tree/main framework, this framework is also focuses on the few-shot setting (choosing the best few-shot samples).
In the future, we aim to add support to such complex prompts (by breaking them down into sub-tasks and doing global optimization on all the components as in dspy).
All the best, Elad.
On Wed, Apr 3, 2024 at 11:02 PM shobhna @.***> wrote:
Thank you for your reply. Please help me understand few things:
- I want to optimize the whole initial prompt as given in the example. My understanding is that if I move some part to task description, then only text given in --prompt field as input will be optimized . Please correct me if I am wrong.
This is a very simple prompt with which I am trying to understand how AutoPrompt works. In reality we have a very complex prompt in which we are explaining schema of various table , table data and relation between the tables. We are also providing few shot examples to make LLM understand what output is expected and how sql query should be formed. We aim to use AutoPrompt to optimize whole prompt having schema and few shot examples. How should I proceed with a complex prompt?
Regards, Shobhna
On Thu, 4 Apr 2024 at 12:43 AM, Eladlev @.***> wrote:
Hi, Few remarks:
- It seems that your ranker is too 'weak', he only requires that the SQL query be relevant, so it is very hard to generate synthetic data on which GPT-3.5/4 will fail (on any relevant prompt).
- In order to avoid divergence you should move some of the initial prompt to the task description.
For example: --task_description: Assistant is a large language model that is tasked to generate SQL query based on details and examples provided in prompt. We have 2 tables: Employee: Employee table have information regarding all the employees in a company. Below are the attributes of Employee table empid: empid column contains employee id. empid is a primary key of Employee table. name: Name column contains name of the employee salary: salary column contains salary of the employee department_id: department_id column contains employee's department id. It is a foreign key from Department table. Department: Department table have information regarding all the department of a company. Below are the attributes of Department table department_id: department_id contains the id of the department. department_id is primary key of Department table. department_name: department_name contains name of the department.
--prompt Your task is to generate SQL query from natural language input provided by user. Your task is to understand natural language input and provide SQL query to fetch information asked in natural language input from above tables.
Lastly, it's important to note that you are using LLM ranker, so you need to skip the ranker training process. You should change this line:
To: generation_config_params.eval.function_params.instruction = ranker_config_params.annotator.config.instruction
— Reply to this email directly, view it on GitHub < https://github.com/Eladlev/AutoPrompt/issues/57#issuecomment-2035389944>,
or unsubscribe <
. You are receiving this because you authored the thread.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/Eladlev/AutoPrompt/issues/57#issuecomment-2035477056,
or unsubscribe < https://github.com/notifications/unsubscribe-auth/AG5EGKGP242VRVP5ANBKM4LY3RN7DAVCNFSM6AAAAABFVPHC4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVGQ3TOMBVGY>
. You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/Eladlev/AutoPrompt/issues/57#issuecomment-2035504059, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGADJHYMWYGC3PHP56YK6QLY3RPYFAVCNFSM6AAAAABFVPHC4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVGUYDIMBVHE . You are receiving this because you authored the thread.Message ID: @.***>
The purpose of the first phase is to fit an LLM ranker according to the user intent (by treating it as a classification task), this saves lots of human effort since the alternative is that the user will rank the whole dataset at each phase.
If you start already with an LLM ranker, then the first stage (fitting an LLM ranker) can only result in a sub-optimal approximation, so it's better to skip this step and use directly the LLM ranker.
I have a prompt which is used to generate sql query from the input text given by a user. I am trying to optimize prompt using run_generation_pipeline.py, but I am getting completely different Calibrated prompt. Below are the inputs provided:
output given by AutoPrompt:
Output given is not relevant to the task. Am I providing the wrong inputs or missing some inputs that needs to be provided?