beeevita / EvoPrompt

Official implementation of the paper Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
75 stars 12 forks source link

请问为什么在evaloator.py中的create_dataset函数中在调用turbo模型时直接令data_with_prompt = test_src_sample而不需要做处理呢 #1

Open Chongyu-hub opened 5 months ago

Chongyu-hub commented 5 months ago

if model == "gpt" and "turbo" in self.args.llm_type: if "turbo" in self.args.llm_type: data_with_prompt = test_src_sample 请问为什么在evaloator.py中的create_dataset函数中在调用turbo模型时直接令data_with_prompt = test_src_sample而不需要做处理呢

beeevita commented 5 months ago

if model == "gpt" and "turbo" in self.args.llm_type: if "turbo" in self.args.llm_type: data_with_prompt = test_src_sample 请问为什么在evaloator.py中的create_dataset函数中在调用turbo模型时直接令data_with_prompt = test_src_sample而不需要做处理呢

因为 turbo 的访问方式与 davinci 不一致,与 turbo 的交互当中,可以把需要的指令放在初始化的 message_list 里,比如 messages_list = [{"role": "system", "content": "Classify the text."}],可以在这里指定:(https://github.com/beeevita/EvoPrompt/blob/c1b50865ee7247be9f209decc483e0beff139a25/llm_client.py#L37

Chongyu-hub commented 5 months ago

但是在evaloator.py中的get_generations函数中调用llm_query时并没有传入prompt_pre这个参数 ` else:#turbo for data in tqdm(dataset): pred = llm_query( data, client=self.client, type=args.llm_type, task=True, **self.llm_config, )

print(pred)

                    hypos.append(pred)`

这是输入: [{'role': 'user', 'content': 'Note that two storms on record, Hurricane Alice from the 1954 season and Tropical Storm Zeta from the 2005 season have formed during December and lasted into January.'}] 这是输出: Hurricane Alice, which formed in December 1954, is the only known hurricane on record to have crossed over from one calendar year to the next. It originated in the eastern Atlantic Ocean on December 30, 1954, and intensified into a hurricane on January 2, 1955. It continued to exist until January 6, 1955, before dissipating in the central Atlantic.

Tropical Storm Zeta, which formed in December 2005, is the most recent known storm to have crossed over from December into January. It developed in the eastern Atlantic Ocean on December 30, 2005, and became a tropical storm on January 2, 2006. It remained as a tropical storm until January 6, 2006, when it dissipated in the eastern Atlantic. 好像没有给模型传入prompt_pre

Chongyu-hub commented 5 months ago

没有找到prompt_pre是怎么传给turbo的😢

Chongyu-hub commented 5 months ago

但是在evaloator.py中的get_generations函数中调用llm_query时并没有传入prompt_pre这个参数 else:#turbo for data in tqdm(dataset): pred = llm_query( data, client=self.client, type=args.llm_type, task=True, **self.llm_config, ) # print(pred) hypos.append(pred) 这是输入: [{'role': 'user', 'content': 'Note that two storms on record, Hurricane Alice from the 1954 season and Tropical Storm Zeta from the 2005 season have formed during December and lasted into January.'}] 这是输出: Hurricane Alice, which formed in December 1954, is the only known hurricane on record to have crossed over from one calendar year to the next. It originated in the eastern Atlantic Ocean on December 30, 1954, and intensified into a hurricane on January 2, 1955. It continued to exist until January 6, 1955, before dissipating in the central Atlantic.

Tropical Storm Zeta, which formed in December 2005, is the most recent known storm to have crossed over from December into January. It developed in the eastern Atlantic Ocean on December 30, 2005, and became a tropical storm on January 2, 2006. It remained as a tropical storm until January 6, 2006, when it dissipated in the eastern Atlantic. 好像没有给模型传入prompt_pre

这些是我传入的参数: `--seed

5

--dataset

asset

--task

sim

--batch-size

20

--prompt-num

0

--sample_num

100

--language_model

gpt

--budget

10

--popsize

10

--position

pre

--evo_mode

de

--llm_type

turbo

--initial

all

--initial_mode

para_topk

--template

v1

--cache_path

data/sim/asset/seed5/prompts_gpt.json

--output

outputs/sim/asset/gpt/all/de/bd10_top10_topk_para_init/v1/davinci/seed5`

Zhiyuan-R commented 5 months ago

Hello, Chongyu! I just go through the code and hope this can solve your issue.

In the evaloator.py file:

image

"prompt_pre" has been passed to the object "dataset".

image

And you can see from the figure above that "data" is iteratively sampled from "dataset". So the data may include the "prompt_pre" information. I have not checked carefully but it should work.

Chongyu-hub commented 5 months ago

我调试了代码,在给turbo模型准备数据时,传入prompt_pre并没有发挥作用 我将create_dataset函数中的代码更改为这样:

 if "turbo" in self.args.llm_type:
                #data_with_prompt = test_src_sample
                for test_src_line in test_src_sample:
                    prompts = []
                    example = format_template(
                        src=test_src_line,
                        src_name=src_name,
                        tgt_name=tgt_name,
                        template=self.template,
                    )

                    instruction_part = self.instruction_placeholder.replace(
                        "<prompt>", prompt_pre
                    )

                    if position in ["pre", "demon"]:  # demon includes instruction + demon
                        if "alpaca" in self.args.language_model:
                            prompts.append(instruction_part + "\n\n" + example)
                        else:
                            prompts.append(
                                instruction_part + "\n" + demonstrations + example
                            )

                    elif position == "icl":  # no instruction
                        example = instruction_part + "\n" + demonstrations + example
                        prompts.append(example)
                    data_with_prompt.append("\n\n".join(prompts))

得到了正确的结果