Calibrated prompt generated is completely different from initial prompt.

shobhnadhami commented 7 months ago

I have a prompt which is used to generate sql query from the input text given by a user. I am trying to optimize prompt using run_generation_pipeline.py, but I am getting completely different Calibrated prompt. Below are the inputs provided:

--task_description:
Assistant is a large language model that is tasked to generate SQL query based on details and examples provided in prompt.

--prompt: 
We have 2 tables:
    Employee: Employee table have information regarding all the employees in a company.
    Below are the attributes of Employee table
        empid: empid column contains employee id. empid is a primary key of Employee table.
        name: Name column contains name of the employee
        salary: salary column contains salary of the employee
        department_id: department_id column contains employee's department id. It is a foreign key from Department table.
    Department: Department table have information regarding all the department of a company.
    Below are the attributes of Department table
        department_id: department_id contains the id of the department. department_id is primary key of Department table.
        department_name: department_name contains name of the department.
***Below are few examples***:
##Example 1
user query: what is empid of employees in department A?
output: Select Employee.empid
        From Employee 
        Join Department 
        on Employee.department_id = Department.department_id
        Where Department.department_id = 'A';
##Example 2
user query: what is salary of employee with empid=1?
output: Select salary
        From Employee 
        Where empid = 1;
** End of Examples **

Your task is to generate SQL query from natural language input provided by user.
Your task is to understand natural language input and provide SQL query to fetch information asked in natural language input from above tables.

annotator instruction in config_default.yml:
        instruction:
            'We have two tables Employee and Department.
            Employee table have empid, name, salary, department_id as columns
            Department table have department_id, department_name as columns
            You will be given a query in natural language and its interpreted sql query to fetch data from above table. 
            Asses interpreted SQL query with respect to natural language input and table provided. Answer 1 if SQL query is relevant 
           and correct otherwise 0.'

output given by AutoPrompt:

Calibrated prompt score: 1.0
Calibrated prompt: Your task is to generate accurate and context-specific SQL queries based on natural language input provided by the user. Please include specific examples of natural language input and the corresponding expected SQL queries. Additionally, describe the database schema and table structure to provide more context for query generation. Aim for a higher score by improving the model's understanding and accuracy in generating SQL queries.

Output given is not relevant to the task. Am I providing the wrong inputs or missing some inputs that needs to be provided?

Eladlev commented 7 months ago

Hi, Few remarks:

It seems that your ranker is too 'weak', he only requires that the SQL query will be relevant, so it is very hard to generate synthetic data on which GPT-3.5/4 will fail (on any relevant prompt).
In order to avoid divergence you should move some of the initial prompt to the task description.

For example:
--task_description:
Assistant is a large language model that is tasked to generate SQL query based on details and examples provided in prompt.
                    We have 2 tables:
    Employee: Employee table have information regarding all the employees in a company.
    Below are the attributes of Employee table
        empid: empid column contains employee id. empid is a primary key of Employee table.
        name: Name column contains name of the employee
        salary: salary column contains salary of the employee
        department_id: department_id column contains employee's department id. It is a foreign key from Department table.
    Department: Department table have information regarding all the department of a company.
    Below are the attributes of Department table
        department_id: department_id contains the id of the department. department_id is primary key of Department table.
        department_name: department_name contains name of the department.

--prompt
Your task is to generate SQL query from natural language input provided by user.
Your task is to understand natural language input and provide SQL query to fetch information asked in natural language input from above tables.

Lastly, it's important to note that you are using LLM ranker, so you need to skip the ranker training process. You should change this line: https://github.com/Eladlev/AutoPrompt/blob/7f373f219aa360cd2de38c6aa700c1dff282d7de/run_generation_pipeline.py#L53 To: generation_config_params.eval.function_params.instruction = ranker_config_params.annotator.config.instruction

shobhnadhami commented 7 months ago

Thank you for your reply. You mentioned above that ranking training process needs to be skipped as llm ranker is used.

Could you please elaborate why ranker training is not needed. I don’t get it.

Regards, Shobhna

On Thu, 4 Apr 2024 at 1:48 AM, Eladlev @.***> wrote:

Hi, Currently, AutoPrompt doesn't support such complex prompts. The way to do it is to decompose this complex prompt into smaller prompts with specific sub-tasks and optimize them. I recommend you look at the dspy https://github.com/stanfordnlp/dspy/tree/main framework, this framework is also focuses on the few-shot setting (choosing the best few-shot samples).

In the future, we aim to add support to such complex prompts (by breaking them down into sub-tasks and doing global optimization on all the components as in dspy).

All the best, Elad.

On Wed, Apr 3, 2024 at 11:02 PM shobhna @.***> wrote:

Thank you for your reply. Please help me understand few things:

I want to optimize the whole initial prompt as given in the example. My understanding is that if I move some part to task description, then only text given in --prompt field as input will be optimized . Please correct me if I am wrong.

This is a very simple prompt with which I am trying to understand how AutoPrompt works. In reality we have a very complex prompt in which we are explaining schema of various table , table data and relation between the tables. We are also providing few shot examples to make LLM understand what output is expected and how sql query should be formed. We aim to use AutoPrompt to optimize whole prompt having schema and few shot examples. How should I proceed with a complex prompt?

Regards, Shobhna

On Thu, 4 Apr 2024 at 12:43 AM, Eladlev @.***> wrote:

Hi, Few remarks:

It seems that your ranker is too 'weak', he only requires that the SQL query be relevant, so it is very hard to generate synthetic data on which GPT-3.5/4 will fail (on any relevant prompt).

In order to avoid divergence you should move some of the initial prompt to the task description.

For example: --task_description: Assistant is a large language model that is tasked to generate SQL query based on details and examples provided in prompt. We have 2 tables: Employee: Employee table have information regarding all the employees in a company. Below are the attributes of Employee table empid: empid column contains employee id. empid is a primary key of Employee table. name: Name column contains name of the employee salary: salary column contains salary of the employee department_id: department_id column contains employee's department id. It is a foreign key from Department table. Department: Department table have information regarding all the department of a company. Below are the attributes of Department table department_id: department_id contains the id of the department. department_id is primary key of Department table. department_name: department_name contains name of the department.

--prompt Your task is to generate SQL query from natural language input provided by user. Your task is to understand natural language input and provide SQL query to fetch information asked in natural language input from above tables.

Lastly, it's important to note that you are using LLM ranker, so you need to skip the ranker training process. You should change this line:

https://github.com/Eladlev/AutoPrompt/blob/7f373f219aa360cd2de38c6aa700c1dff282d7de/run_generation_pipeline.py#L53

To: generation_config_params.eval.function_params.instruction = ranker_config_params.annotator.config.instruction

— Reply to this email directly, view it on GitHub < https://github.com/Eladlev/AutoPrompt/issues/57#issuecomment-2035389944>,

or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AGADJH2JWGOIH4BIGFQ6BKLY3RIERAVCNFSM6AAAAABFVPHC4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVGM4DSOJUGQ>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/Eladlev/AutoPrompt/issues/57#issuecomment-2035477056,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/AG5EGKGP242VRVP5ANBKM4LY3RN7DAVCNFSM6AAAAABFVPHC4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVGQ3TOMBVGY>

. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/Eladlev/AutoPrompt/issues/57#issuecomment-2035504059, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGADJHYMWYGC3PHP56YK6QLY3RPYFAVCNFSM6AAAAABFVPHC4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZVGUYDIMBVHE . You are receiving this because you authored the thread.Message ID: @.***>

Eladlev commented 7 months ago

The purpose of the first phase is to fit an LLM ranker according to the user intent (by treating it as a classification task), this saves lots of human effort since the alternative is that the user will rank the whole dataset at each phase.

If you start already with an LLM ranker, then the first stage (fitting an LLM ranker) can only result in a sub-optimal approximation, so it's better to skip this step and use directly the LLM ranker.

Eladlev / AutoPrompt

Calibrated prompt generated is completely different from initial prompt. #57