Eladlev / AutoPrompt

A framework for prompt tuning using Intent-based Prompt Calibration
Apache License 2.0
2.22k stars 193 forks source link

Trouble with generation and multi-classification #106

Open 995667874 opened 1 day ago

995667874 commented 1 day ago

Hi, sorry to bother you, I'm having trouble with generation and multi-classification problems according to https://github.com/Eladlev/AutoPrompt/blob/main/docs/examples.md#generating-movie-reviews-generation-task. I changed to dataset:label_schema: ["1", "2", "3", "4", "5"] and got an error. I hope to get some help, or if you can provide the yml file for generating and multi-classification tasks, thank you very much! Traceback (most recent call last): File "/home/hezhuo/AutoPrompt/run_pipeline.py", line 44, in best_prompt = pipeline.run_pipeline(opt.num_steps) File "/home/hezhuo/AutoPrompt/optimization_pipeline.py", line 273, in run_pipeline stop_criteria = self.step(i, num_steps_remaining) File "/home/hezhuo/AutoPrompt/optimization_pipeline.py", line 252, in step self.eval.add_history(self.cur_prompt, self.task_description) File "/home/hezhuo/AutoPrompt/eval/evaluator.py", line 115, in add_history conf_matrix = confusion_matrix(self.dataset['annotation'], File "/home/hezhuo/miniconda3/envs/AutoPrompt/lib/python3.10/site-packages/sklearn/utils/_param_validation.py", line 213, in wrapper return func(*args, **kwargs) File "/home/hezhuo/miniconda3/envs/AutoPrompt/lib/python3.10/site-packages/sklearn/metrics/_classification.py", line 356, in confusion_matrix raise ValueError("At least one label specified must be in y_true") ValueError: At least one label specified must be in y_true

Eladlev commented 1 day ago

Can you provide all the changes that you made in the configuration files? It seems like you changed the annotator to LLM with not fitted prompt and this cause the issue.

995667874 commented 18 hours ago

Can you provide all the changes that you made in the configuration files? It seems like you changed the annotator to LLM with not fitted prompt and this cause the issue.

Do both multi-classification and generation tasks need to set annotator to " "? Can the former be omitted? This is my configuration file. Thank you very much for your reply. Best wishes. use_wandb: True dataset: name: 'dataset' records_path: null initial_dataset: '' label_schema: ["1", "2", "3", "4", "5"] max_samples: 50 semantic_sampling: False # Change to True in case you don't have M1. Currently there is an issue with faiss and M1

annotator: method: ''

predictor: method : 'llm' config: llm: type: 'OpenAI' name: 'gpt-3.5-turbo-1106'

async_params:

retry_interval: 10

max_retries: 2

        model_kwargs: {"seed": 220}
    num_workers: 5
    prompt: 'prompts/predictor_completion/prediction.prompt'
    mini_batch_size: 1  #change to >1 if you want to include multiple samples in the one prompt
    mode: 'prediction'

meta_prompts: folder: 'prompts/meta_prompts_classification' num_err_prompt: 1 # Number of error examples per sample in the prompt generation num_err_samples: 2 # Number of error examples per sample in the sample generation history_length: 4 # Number of sample in the meta-prompt history num_generated_samples: 10 # Number of generated samples at each iteration num_initialize_samples: 10 # Number of generated samples at iteration 0, in zero-shot case samples_generation_batch: 10 # Number of samples generated in one call to the LLM num_workers: 5 #Number of parallel workers warmup: 4 # Number of warmup steps

eval: function_name: 'accuracy' num_large_errors: 4 num_boundary_predictions : 0 error_threshold: 0.5

llm: name: 'gpt-4-1106-preview' # This is the meta-prompt LLM, it should be a strong model. For example, using GPT-3.5 will cause an error in many cases. type: 'OpenAI' # Can be OpenAI, Anthropic, Google, Azure temperature: 0.8

stop_criteria: max_usage: 2 #In $ in case of OpenAI models, otherwise number of tokens patience: 10 # Number of patience steps min_delta: 0.01 # Delta for the improvement definition

Eladlev commented 14 hours ago

The generation task consists of two phases:

  1. Fitting the ranking prompt
  2. Using the ranking prompt to optimize the generation task

Your failure is in the first phase, where we try to fit the ranking prompt. In this phase, we treat the ranking task as a classification task. So in order to work the annotator in the default_config should be:

annotator:
    method : 'argilla'

If you want to skip the first phase and use LLM as a ranker this can be easily modified. However, the current failure is due to the fact that the annotator for the first phase is not the appropriate annotator.

995667874 commented 12 hours ago

Thank you!

---Original--- From: @.> Date: Fri, Nov 22, 2024 14:06 PM To: @.>; Cc: @.**@.>; Subject: Re: [Eladlev/AutoPrompt] Trouble with generation andmulti-classification (Issue #106)

The generation task consists of two phases:

Fitting the ranking prompt

Using the ranking prompt to optimize the generation task

Your failure is in the first phase, where we try to fit the ranking prompt. In this phase, we treat the ranking task as a classification task. So in order to work the annotator in the default_config should be: annotator: method : 'argilla'
If you want to skip the first phase and use LLM as a ranker this can be easily modified. However, the current failure is due to the fact that the annotator for the first phase is not the appropriate annotator.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>