list Index out of range issue

I am processing a doc with augmentoolkit and encountering an error/exception, "list index out of range" in generation_functions/engine_wrapper_class.py

async def submit_completion( ... ...
...
async for chunk in stream: try: if chunk.choices[0].delta.content: completion = completion + chunk.choices[0].delta.content except Exception as e: print("\n\n------------CAUGHT EXCEPTION DURING GENERATION") print("chunk: ", chunk) print("completion: ", completion) print(e) timed_out = True print("\n\n-----/------") ...

I am printing "chunk" and see that chunk.choices is an empty array. However, I don't know why it is empty and if it is OK.

-----/------ Output written to ./outputIpcom/check_question_generations/2f9a1af1-0383-4490-b6e3-23f1a4111a0c--subquestion--82d10b0d-3200-418e-af27-936337e88ea8--check--6203cd6a-b8e4-4d9d-afb5-0fea6f70ce69.yaml 2024-11-15 21:22:02,682 - INFO - HTTP Request: POST http://localhost:9000/v1/chat/completions "HTTP/1.1 200 OK"

------------CAUGHT EXCEPTION DURING GENERATION chunk: ChatCompletionChunk(id='chat-67e86151a823496ab7d12db8eefe6689', choices=[], created=1731705722, model='mistralai/mistral-large', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_example', usage=None) completion: ## Reasoning and thought process:

Text Analysis:

Identify Key Information: The text provides a step-by-step guide on how to view current and historical invoices.

Categorize Information Type: The information is procedural, outlining specific actions to be taken on a platform.

Answer Breakdown:

Dissect the Answer: The answer describes the steps to view invoices, mentioning the "Subscription" tab and the "Billing History" section.

Identify Answer Type: The statement is a procedural guide, reflecting the steps outlined in the text.

Accuracy Check:

Direct Comparison for Factual Accuracy:

The text states that to view invoices, one should click the "Subscription" tab and then expand the "Billing History" section under "Subscription Details."
The answer accurately reflects these steps.
Inference and Contextual Alignment: The answer aligns perfectly with the procedural information provided in the text.

Final Judgment:

Comprehensive Assessment: The answer accurately reflects the steps described in the text for viewing invoices.

Overall Accuracy Determination: Accurate.

list index out of range

I have attached the config.yaml.

Augmentoolkit finishes processing the documents and produces Q/A pairs: drwxr-xr-x 2 root root 28672 Nov 15 21:24 check_answer_accuracy_generations drwxr-xr-x 2 root root 28672 Nov 15 21:24 check_question_generations drwxr-xr-x 4 root root 4096 Nov 15 21:14 judge_paragraph_generations -rw-r--r-- 1 root root 81303 Nov 15 21:14 judge_paragraph_generations_DATAGEN_OUTPUT.jsonl -rw-r--r-- 1 root root 246317 Nov 15 21:29 master_list.jsonl -rw-r--r-- 1 root root 38109 Nov 15 21:29 plain_qa_list.jsonl -rw-r--r-- 1 root root 27008 Nov 15 21:13 pretraining.jsonl drwxr-xr-x 2 root root 4096 Nov 15 21:24 qatuples_filtered drwxr-xr-x 4 root root 4096 Nov 15 21:24 question_context_revision_generations drwxr-xr-x 4 root root 4096 Nov 15 21:14 question_generation_generations -rw-r--r-- 1 root root 106407 Nov 15 21:29 questions_generation_dataset.jsonl root@0797d8d75562:/tmp/augmentoolkit#

Is the exception problematic and, if so, do you have suggestions how to fix it?

Here is the config.yaml: API: API_KEY: xxxxx BASE_URL: http://localhost:9000/v1 LARGE_LOGICAL_MODEL: mistralai/mistral-large LOGICAL_MODEL: mistralai/mistral-large HUGGINGFACE: HUB_PATH: Heralax/test-atk-dataset-do-not-use-3 PRIVATE: False PUSH_TO_HUB: False PATH: DEFAULT_PROMPTS: ./prompts INPUT: /tmp/augmentoolkit/original/inputIpcom OUTPUT: ./outputIpcom PROMPTS: ./prompts PHASE: PHASE_INDEX: 3 WORK_IN_PHASES: False SKIP: ANSWER_RELEVANCY_CHECK: True FILTER_CHUNKS: False QUESTION_CHECK: False CONVERSATION_GENERATION: True REPAIR_QA_TUPLES: True SYSTEM: CHUNK_SIZE: 1900 COMPLETION_MODE: False CONCURRENCY_LIMIT: 3 CONVERSATION_INSTRUCTIONS: For this conversation, you are generating a chat between a generic user, and an assistant. DOUBLE_CHECK_COUNTER: 1 DO_NOT_USE_SYSTEM_PROMPTS: True FINAL_ASSISTANT_PROMPT_NO_RAG: 'You are a helpful assistant.

FINAL_ASSISTANT_PROMPT_RAG: 'You are a helpful assistant.

Context information is below:

----------------------

{data}

'

MODE: api STOP: True SUBSET_SIZE: 5000 USE_FILENAMES: False USE_SUBSET: True SCRAPING: USE_GUTENBERG: False START_URL: "https://www.gutenberg.org/ebooks/bookshelf/57" MAX_BOOKS: 5 MAX_FAILURES: 5

e-p-armstrong / augmentoolkit