KeyError: 'samples' - Githubissues

YukiChen-yuxin commented 3 months ago

Hi! I tried to run the pipeline using Azure Open AI, using LLM as annotator, but got this error.

Processing samples: 100%|##########| 1/1 [00:24<00:00, 24.04s/it]
Traceback (most recent call last):
  File "prompt_model\AutoPrompt\run_pipeline.py", line 44, in <module>
    best_prompt = pipeline.run_pipeline(opt.num_steps)
  File "prompt_model\AutoPrompt\optimization_pipeline.py", line 274, in run_pipel
    stop_criteria = self.step()
  File "prompt_model\AutoPrompt\optimization_pipeline.py", line 233, in step
    self.generate_initial_samples()
  File "prompt_model\AutoPrompt\optimization_pipeline.py", line 194, in generate_tial_samples
    samples_list = [element for sublist in samples_batches for element in sublist['samples']]
  File "prompt_model\AutoPrompt\optimization_pipeline.py", line 194, in <listcomp
    samples_list = [element for sublist in samples_batches for element in sublist['samples']]
KeyError: 'samples'

YukiChen-yuxin commented 3 months ago

I got the same error when I tried to use Argilla, is there any place I need to store my own samples data for this part? Thanks

Eladlev commented 3 months ago

Can you provide more details:

Are you running the classification pipeline or the generation?
Please provide all the config modifications

Also, please verify that you are not loading dumps (this error might be due to dumps issues).

YukiChen-yuxin commented 3 months ago

I'm running a classification pipeline. The config file is:

use_wandb: True
dataset:
    name: 'dataset'
    records_path: null
    initial_dataset: ''
    label_schema: ["toxic", "non-toxic"]
    max_samples: 50
    semantic_sampling: True # Change to True in case you don't have M1. Currently there is an issue with faiss and M1

annotator:
    method: 'llm'
    config:
        llm:
            type: 'Azure'
            name: 'gpt-4-32k-0613'
        instruction:
            'Assess whether the text contains a spam topic. 
            Answer toxic if it does and non-toxic otherwise.'
        num_workers: 5
        prompt: 'prompts/predictor_completion/prediction.prompt'
        mini_batch_size: 1
        mode: 'annotation'

predictor:
    method : 'llm'
    config:
        llm:
            type: 'Azure'
            name: 'gpt-4-32k-0613'
#            async_params:
#                retry_interval: 10
#                max_retries: 2
            model_kwargs: {"seed": 220}
        num_workers: 5
        prompt: 'prompts/predictor_completion/prediction.prompt'
        mini_batch_size: 1  #change to >1 if you want to include multiple samples in the one prompt
        mode: 'prediction'

meta_prompts:
    folder: 'prompts/meta_prompts_classification'
    num_err_prompt: 1  # Number of error examples per sample in the prompt generation
    num_err_samples: 2 # Number of error examples per sample in the sample generation
    history_length: 4 # Number of sample in the meta-prompt history
    num_generated_samples: 10 # Number of generated samples at each iteration
    num_initialize_samples: 10 # Number of generated samples at iteration 0, in zero-shot case
    samples_generation_batch: 10 # Number of samples generated in one call to the LLM
    num_workers: 5 #Number of parallel workers
    warmup: 4 # Number of warmup steps

eval:
    function_name: 'accuracy'
    num_large_errors: 4
    num_boundary_predictions : 0
    error_threshold: 0.5

llm:
    type: 'Azure'
    name: 'gpt-4-32k-0613'
    temperature: 0

stop_criteria:
    max_usage: 2 #In $ in case of OpenAI models, otherwise number of tokens
    patience: 10 # Number of patience steps
    min_delta: 0.01 # Delta for the improvement definition

Sorry where to check if i'm loading dumps or not? Thanks

YukiChen-yuxin commented 3 months ago

Seems there is something wrong when pipeline tries to generate initial samples, I wonder if we can use our own sample datasets or not, I changed the initial_dataset in config file to 'dump/validated_test_file_temp.csv' which contains one column with several samples and then I got this error, is there any other config I need to modify if I want to use my own sample data?

prompt_model\AutoPrompt\run_pipeline.py:44 in <module>    |
|                                                                             |
|   41 pipeline = OptimizationPipeline(config_params, task_description, initi |
|   42 if (opt.load_path != ''):                                              |
|   43     pipeline.load_state(opt.load_path)                                 |
| > 44 best_prompt = pipeline.run_pipeline(opt.num_steps)                     |
|   45 print('\033[92m' + 'Calibrated prompt score:', str(best_prompt['score' |
|   46 print('\033[92m' + 'Calibrated prompt:', best_prompt['prompt'] + '\033 |
|   47                                                                        |
|                                                                             |
| prompt_model\AutoPrompt\optimization_pipeline.py:274 in   |
| run_pipeline                                                                |
|                                                                             |
|   271         # Run the optimization pipeline for num_steps                 |
|   272         num_steps_remaining = num_steps - self.batch_id               |
|   273         for i in range(num_steps_remaining):                          |
| > 274             stop_criteria = self.step()                               |
|   275             if stop_criteria:                                         |
|   276                 break                                                 |
|   277         final_result = self.extract_best_prompt()                     |
|                                                                             |
| prompt_model\AutoPrompt\optimization_pipeline.py:242 in   |
| step                                                                        |
|                                                                             |
|   239                 step=self.batch_id)                                   |
|   240                                                                       |
|   241         logging.info('Running annotator')                             |
| > 242         records = self.annotator.apply(self.dataset, self.batch_id)   |
|   243         self.dataset.update(records)                                  |
|   244                                                                       |
|   245         self.predictor.cur_instruct = self.cur_prompt                 |
|                                                                             |
| AutoPrompt\estimator\estimator_argilla.py:92 |
| in apply                                                                    |
|                                                                             |
|    89         except:                                                       |
|    90             self.initialize_dataset(dataset.name, dataset.label_schem |
|    91             rg_dataset = current_api.datasets.find_by_name(dataset.na |
| >  92         batch_records = dataset[batch_id]                             |
|    93         if batch_records.empty:                                       |
|    94             return []                                                 |
|    95         self.upload_missing_records(dataset.name, batch_id, batch_rec |
|                                                                             |
| AutoPrompt\dataset\base_dataset.py:41 in     |
| __getitem__                                                                 |
|                                                                             |
|    38         """                                                           |
|    39         Return the batch idx.                                         |
|    40         """                                                           |
| >  41         extract_records = self.records[self.records['batch_id'] == ba |
|    42         extract_records = extract_records.reset_index(drop=True)      |
|    43         return extract_records                                        |
|    44                                                                       |
|                                                                             |
| AutoPrompt\lib\site-packa |
| ges\pandas\core\frame.py:4090 in __getitem__                                |
|                                                                             |
|    4087         if is_single_key:                                           |
|    4088             if self.columns.nlevels > 1:                            |
|    4089                 return self._getitem_multilevel(key)                |
| >  4090             indexer = self.columns.get_loc(key)                     |
|    4091             if is_integer(indexer):                                 |
|    4092                 indexer = [indexer]                                 |
|    4093         else:                                                       |
|                                                                             |
| AutoPrompt\lib\site-packa |
| ges\pandas\core\indexes\base.py:3812 in get_loc                             |
|                                                                             |
|   3809                 and any(isinstance(x, slice) for x in casted_key)    |
|   3810             ):                                                       |
|   3811                 raise InvalidIndexError(key)                         |
| > 3812             raise KeyError(key) from err                             |
|   3813         except TypeError:                                            |
|   3814             # If we have a listlike key, _check_indexing_error will  |
|   3815             #  InvalidIndexError. Otherwise we fall through and re-r |
+-----------------------------------------------------------------------------+
KeyError: 'batch_id'

YukiChen-yuxin commented 3 months ago

I think I solved this based on the dataset info you provided in example.md. Just want to check what is the difference between prediction col and annotation col, as I know we need to provide both of them in input datasets right? And what are the metadata and score for here

Thanks for your help.

id,text,prediction,annotation,metadata,score,batch_id
0,"The cinematography was mesmerizing, especially during the scene where they finally reveal the mysterious room that captivated the main character.",No,Yes,,,0
1,"The director's bold choice to leave the world's fate unclear until the final frame will spark audience discussions.",No,Yes,,,0

Eladlev commented 3 months ago

Annotation: The annotator's response. This is the GT provided by the annotator.
Prediction: The predictor response. This is the result of the current prompt
Score: The score of the current sample, calculated by evaluating the score function on the prediction value and the annotation value (in the case of classification it's 0 if they are not the same or 1 if they are the same).
metadata: This is the metadata for the Argilla dataset, in practice it contains the batch_id value, and it's not necessary

YukiChen-yuxin commented 3 months ago

Oh got it, so if I want to input my own sample dataset file, can I leave the prediction col to be all empty?

Eladlev commented 3 months ago

Yes. Another important thing:

If you remove completely the annotator from the config file he will use the annotation column in the csv
Otherwise he will update the annotation column (according to the annotator you choose), so you can leave also this column to be empty.

YukiChen-yuxin commented 3 months ago

Hi, I will get the AttributeError if I remove the whole annotator part from config file

prompt_model\AutoPrompt\optimization_pipeline.py:57 in		init
54 self.cur_prompt = initial_prompt
55
56 self.predictor = give_estimator(config.predictor)
> 57 self.annotator = give_estimator(config.annotator)
58 self.eval = Eval(config.eval, self.meta_chain.error_analysis,
59 self.batch_id = 0
60 self.patient = 0

+-----------------------------------------------------------------------------+ AttributeError: 'EasyDict' object has no attribute 'annotator' (AutoPrompt)

Eladlev commented 3 months ago

Yes, you are right it should not be removed completely but you should modify the method to empty string:

annotator:
   method : ''

YukiChen-yuxin commented 3 months ago

Thanks. And I also met this error, does it mean my llm didn't return any new prompt	prompt_model\AutoPrompt\optimization_pipeline.py:116 in		run_step_prompt
113 prompt_suggestion = self.meta_chain.step_prompt_chain.invoke(
114 print(f'prompt_suggestion: {prompt_suggestion}')
115 self.log_andprint(f'Previous prompt score:\n{self.eval.mean
> 116 self.log_and_print(f'Get new prompt:\n{prompt_suggestion["pro
117 self.batch_id += 1
118 if len(self.dataset) < self.config.dataset.max_samples:
119 batch_input = {"num_samples": self.config.meta_prompts.sa

+-----------------------------------------------------------------------------+ KeyError: 'prompt'

Eladlev commented 3 months ago

It might be an issue with openAI functions (although the model you are using should be support functions). You can try to use the completion pipeline:

meta_prompts:
    folder: 'prompts/meta_prompts_completion'

YukiChen-yuxin commented 3 months ago

In this part in evaluator.py, seems it labeled all my prediction col to Discard and then delete all my data, and the self.dataset is empty and all metrix are empty due to this. I'm not sure what did I change so I re-download the code repo and only changed the dataset file and two config files, still got this. Do you have any idea what is going on here

def eval_score(self) -> float:
        """
        Calculate the score on each row and return the mean score.
        :return: The mean score
        """
        # filter out the discarded samples
        self.dataset = self.dataset[(self.dataset['prediction'] != 'Discarded') &
                                    (self.dataset['annotation'] != 'Discarded')]
        self.dataset = self.score_func(self.dataset)
        self.mean_score = self.dataset['score'].mean()
        return self.mean_score

| prompt_model\AutoPrompt\eval\evaluator.py:126 in          |
| add_history                                                                 |
|                                                                             |
|   123         analysis = self.analyzer.invoke(prompt_input)                 |
|   124                                                                       |
|   125         self.history.append({'prompt': prompt, 'score': self.mean_sco |
| > 126                              'errors': self.errors, 'confusion_matrix |
|   127                                                                       |
|   128     def extract_errors(self) -> pd.DataFrame:                         |
|   129         """                                                           |
+-----------------------------------------------------------------------------+
TypeError: 'NoneType' object is not subscriptable

Eladlev commented 3 months ago

It seems like a dataset structure issue. We will soon add to the readme an example of how to use your own annotated dataset and apply only the optimization part to this data. Meanwhile, I suggest iterating on your exact use case and data in our Discord channel, I think it will be easier and faster.

danielliu99 commented 3 months ago

It seems like a dataset structure issue. We will soon add to the readme an example of how to use your own annotated dataset and apply only the optimization part to this data. Meanwhile, I suggest iterating on your exact use case and data in our Discord channel, I think it will be easier and faster.

Has the example of adding my own annotated dataset been updated? : )

danielliu99 commented 3 months ago

Yes. Another important thing:

If you remove completely the annotator from the config file he will use the annotation column in the csv

Otherwise he will update the annotation column (according to the annotator you choose), so you can leave also this column to be empty.

If I have 30 samples, with text and annotation. Is it possible to use these samples in the first few iterations, while user llm-generated samples in the following iterations? How should I modify the config files?

Eladlev commented 3 months ago

Hi @danielliu99. I still didn't have time to update the readme file. But I will update here the exact steps and soon we will organize and add it with an example to the text:

In order to iterate on your own dataset you need to:

1.You need to transfer your data to AutoPrompt dataset format: "Id","text","prediction","annotation","metadata","score","batch_id" Where Id is a unique row Id, 'text' is the input to the prompt, 'prediction' should be empty, 'annotation' should be the GT (the class label), 'score' should be empty and 'batch_id' should be 0 for all the rows.

Put this csv in some empty folder (the important thing is that the history.pickle will not be in this folder)
You need to make the following changes in the default_config file: As always modify the label schema to your label schema. The fields: max_samples: 50

should be modified to the number of samples in your csv (from 50) Lastly, you should modify the annotator and change the method to the empty string: method : ''

In the run_pipeline in the --load_path parameter, you should put the location of the folder with the csv

If you want to start with your dataset and then continue with synthetic data:

You need to follow the same steps as above and simply change the max_samples to be 30 + the number of synthetic samples you want to add Additionally in this case the annotator method should be either argilla (human) or llm. This means that the model will ask you to reannotate your samples (we are doing it for consistency), if you want to skip the annotation of these samples you need to add in this location: https://github.com/Eladlev/AutoPrompt/blob/cdddccf9f2105d8bf8e688818932b18e645f5136/estimator/estimator_argilla.py#L92C1-L93C1 another condition that will return an empty array in case all the samples are already annotated

Eladlev / AutoPrompt

KeyError: 'samples' #51