List index out of range error

kartik122 commented 3 weeks ago

Trying to run the augment toolkit on MacOs M3, with ollama (ollama run llama3) on the following config.yaml PATH: INPUT: "./raw_text_input" OUTPUT: "./output" DEFAULT_PROMPTS: "./prompts" # the baseline prompt folder that Augmentoolkit falls back to if it can't find a step in the PROMPTS path PROMPTS: "./prompts" # Where Augmentoolkit first looks for prompts API: API_KEY: "53212512" BASE_URL: http://127.0.0.1:11434/ LARGE_LOGICAL_MODEL: llama3 LOGICAL_MODEL: llama3 # model used for question generation and conversation generation at the very end. A pretty tough task, if ASSISTANT_MODE isn't on. QUANTIZATION_SMALL: "gptq" # Only use if Aphrodite mode is on. QUANTIZATION_LARGE: "gptq" # Only use if Aphrodite mode is on. SKIP: QUESTION_CHECK: False ANSWER_RELEVANCY_CHECK: False # turn on if using the negative question prompt override FILTER_CHUNKS: False SYSTEM: CHUNK_SIZE: 1900 USE_FILENAMES: False # give the AI context from the filenames provided to it. Useful if the filenames are meaningful, otherwise turn them off. DOUBLE_CHECK_COUNTER: 1 # How many times to check a question and answer pair during each validation step. Majority vote decides if it passes that step. There are three steps. So most questions are by default checked around 9 times (fewer if the first two checks for a step pass, obviously). SUBSET_SIZE: 10 USE_SUBSET: False # Whether to take only the first 13 chunks from a text during the run. Useful for experimenting and iterating and seeing all the steps without costing too much money or time. CONCURRENCY_LIMIT: 50 # Hard limit of how many calls can be run at the same time, useful for API mode (aphrodite automatically manages this and queues things, as far as I know) COMPLETION_MODE: False # Change to false if you want to use chat (instruct) mode; this requires .json files in your chosen prompts directory, in the OpenAI API format. Not all APIs support completion mode. MODE: "api" # can be one of "api"|"aphrodite" STOP: True # True = Use stop tokens, False = do not use stop tokens. OpenAI's API restricts you to four stop tokens and all steps have way more than four stop tokens, so you'll need to turn this to False if you're using OAI's API. Also NOTE that if you turn this OFF while using COMPLETION MODE, EVERYTHING WILL BREAK and it will cost you money in the process. Don't do that. CONVERSATION_INSTRUCTIONS: For this conversation, you are generating a chat between a generalist, generic AI assistant, and a human. FINAL_ASSISTANT_PROMPT_NO_RAG: | You are a helpful AI assistant. FINAL_ASSISTANT_PROMPT_RAG: | You are a helpful AI assistant.

Context information is below:

{data} PHASE: WORK_IN_PHASES: False PHASE_INDEX: 3 # index of the phase we are currently on (index 0 = filtering out chunks with no relevant context; index 1 = question generation; index 2 = question validation; index 3 = context revision and conversation generation, the final phase) HUGGINGFACE: HUB_PATH: "Heralax/test-atk-dataset-do-not-use-3" PRIVATE: false PUSH_TO_HUB: false

im getting the error as follows : LOADING: failed|./raw_text_input/medicine_wikipedia 100%|█████████████████████████████████████████| 85/85 [00:00<00:00, 5419.66it/s] Converting generations to training data entering saving mode ...Converted successfully (we think) Traceback (most recent call last): File "/Users/kargupta8/Desktop/augmentoolkit/processing.py", line 505, in asyncio.run(main()) File "/Users/kargupta8/miniconda3/envs/augment-toolkit/lib/python3.11/asyncio/runners.py", line 190, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/Users/kargupta8/miniconda3/envs/augment-toolkit/lib/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/kargupta8/miniconda3/envs/augment-toolkit/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/Users/kargupta8/Desktop/augmentoolkit/processing.py", line 226, in main print(filtered_worthy_for_questions[0])


IndexError: list index out of range

e-p-armstrong commented 2 weeks ago

One thing this could be is that if everything fails validation (possibly due to the model breaking with the output format every time, if the model is not very strong, or sometimes due to API errors and other issues) then the chunks list will be empty and nothing will generate past that.

Strangely it looks like you're using the default inputs, so we can probably rule that out. And llama 3 is also definitely capable of running Augmentoolkit, so that rules that out as well unless you're running a very low quant or something's up with your server.

Could you share some of the intermediate outputs? You should be able to find them in outputs/judge_paragraph_generations. In that folder there will be a bunch of yaml files containing the full prompts + the AI output at the very end. It might contain a clue about what's going on.

kartik122 commented 2 weeks ago

Hi @e-p-armstrong this is one of the output files from the judge_paragraph_generations folder

content: "You are an expert educational AI that will make a determination as to\ \ whether the contents of the paragraph(s) provided are suitable for making educational\ \ questions based off of them; these questions should be able to test the knowledge\ \ in in the book. If there is sufficiently deep information to make questions\ \ about, you will judge it suitable, even if the knowledge being tested does not\ \ reflect typical curricula. Essentially: you will determine if provided text\ \ is a table of contents, introductory paragraph for a book, etc., or if it actually\ \ contains real information that would be worthy to teach and make questions for\ \ an examination from. Your task includes first analyzing the text, thinking through\ \ whether or not good questions can be made from it. \n\nEnd-of-chapter exercises,\ \ or questions that appear in the text BUT DO NOT HAVE A PROVIDED ANSWER, are\ \ not suitable for question generation, since it should be assumed that the creator\ \ of the questions must rely on the information in the text to get their answers.\n\ \nTables of contents are not suited for educational questions.\n\n\nFollowing\ \ this, at the very end of your response, you will write \"Suitable\" or \"Not\ \ suitable\". It is imperative that you write one of these two things, as your\ \ answer is being automatically processed by a regex, so it must match one of\ \ those two strings exactly." role: system
content: "Text: \n\"\"\"\nThe Project Gutenberg eBook of Through England on a side\ \ saddle\n \nThis ebook is for the use of anyone anywhere in the United States\ \ and\nmost other parts of the world at no cost and with almost no restrictions\n\ whatsoever. You may copy it, give it away or re-use it under the terms\nof the\ \ Project Gutenberg License included with this ebook or online\nat www.gutenberg.org.\ \ If you are not located in the United States,\nyou will have to check the laws\ \ of the country where you are located\nbefore using this eBook.\n\nTitle: Through\ \ England on a side saddle\n In the time of William and Mary\n\n\nAuthor:\ \ Celia Fiennes\n\nContributor: Emily W. Griffiths\n\nRelease date: November 17,\ \ 2023 [eBook #72156]\n\nLanguage: English\n\nOriginal publication: London: Simpkin,\ \ Marshall & Co.; Hamilton, Adams & Co, 1888\n\nCredits: Steve Mattern, Barry\ \ Abrahamsen, and the Online Distributed Proofreading Team at https://www.pgdp.net\ \ (This book was produced from images made available by the HathiTrust Digital\ \ Library.)\n\n\n START OF THE PROJECT GUTENBERG EBOOK THROUGH ENGLAND ON A\ \ SIDE SADDLE \n\n\n\n\n Through England\n\n \ \ On a Side Saddle\n\n\n IN THE TIME\ \ OF WILLIAM AND MARY\n\n\n BEING THE DIARY OF\n\n \ \ CELIA FIENNES.\n\n\n \ \ --------------\n\n\n WITH AN INTRODUCTION BY\n\n \ \ THE HON. MRS GRIFFITHS.\n\n\n\n\n \ \ ---------------------\n\n\n\n\n LONDON:\n\ \ Field & Tuer, The Leadenhall Press, E.C.\n Simpkin,\ \ Marshall & Co.; Hamilton, Adams & Co.\n\n ----------\n\ \n New York: Scribner & Welford, 743 & 745, Broadway.\n\n \ \ --\n\n 1888\n\n\n\ ------------------------------------------------------------------------\n\"\"\ \"" role: user
content: "## Reasoning and thought process:\nStep 1. Identify Paragraph Content:\ \ This paragraph is from the introductory section of \"Through England on a Side\ \ Saddle\" by Celia Fiennes, including copyright and publication information.\n\ Step 2. Evaluate Educational Relevance: The paragraph contains legal, copyright,\ \ and publication information specific to the book's distribution and not its\ \ content.\nStep 3. Assess Specific Contexts and Formats:\n - Purely Descriptive\ \ Lists: The paragraph is a descriptive list of publication details without educational\ \ content.\n - Dialogues or Quotes Without Context: Not applicable.\n - Ambiguous\ \ or Unclear Texts: The text is clear but lacks educational substance.\nStep 4.\ \ Assess the Possibility of Formulating Questions: The paragraph's content does\ \ not lend itself to educational questions due to its focus on publication rather\ \ than substantive content.\nStep 5. Final Judgment: Unsuitable for educational\ \ questions." role: assistant
content: 'Text:

"""
1. Apples, 2. Oranges, 3. Bananas, 4. Grapes, 5. Papayas, 6. Pomegranates, 7. Cherries. Sarah wandered through the market. Amidst her thoughts, she overheard a conversation:
"Did you hear about the mayor''s decision?" asked John.

"It''s quite surprising, isn''t it?" replied Emily, her voice tinged with disbelief.

"I know, right? But what can we do?" John sighed.

Sarah continued her shopping, her mind now on the mayor''s mysterious decision.

"""

Note that even blunt facts can be suitable for questions, and unconventional knowledge is not necessarily unsuitable. Fictional stories that contain strong morals or philosophy can also have good questions made from them. But legal notices, metadata, and tables of contents are not suitable. Lists of information without the context needed for the question-maker to understand the text; quotes or dialogues without context or clear depth; or ambiguous content that isn''t precise enough to "nail down" a solid question from, are not valid.' role: user
content: "## Reasoning and thought process:\nStep 1. Identify Paragraph Content:\ \ The paragraph appears to be from a narrative, including a list and a snippet\ \ of overheard conversation.\nStep 2. Evaluate Educational Relevance: The paragraph\ \ contains a simple list of fruits and an unrelated, context-less dialogue, offering\ \ little educational substance.\nStep 3. Assess Specific Contexts and Formats:\n\ \ - Purely Descriptive Lists: The list is purely descriptive, providing no educational\ \ context or analytical significance. Further, there's no context about what it\ \ could be about.\n - Dialogues or Quotes Without Context: The conversation about\ \ the mayor's decision lacks context and background, making it difficult to derive\ \ any educational content.\n - Ambiguous or Unclear Texts: While not ambiguous,\ \ the paragraph's content does not contribute to a clear narrative or concept\ \ that has philosophical or educational relevance.\nStep 4. Assess the Possibility\ \ of Formulating Questions: Due to the lack of contextual or thematic depth, formulating\ \ educational questions from this paragraph is challenging.\nStep 5. Final Judgment:\ \ Unsuitable for educational questions." role: assistant
content: "Text:\n\"\"\"\nIf, then, we represent our earth as a little ball of one\ \ inch diameter, the sun would be a big globe nine feet across and 323 yards away,\ \ that is about a fifth of a mile, four or five minutes\u2019 walking. The moon\ \ would be a small pea two feet and a half from the world. Between earth and\ \ sun there would be the two inner planets, Mercury and Venus, at distances of\ \ one hundred and twenty-five and two hundred and fifty yards from the sun. All\ \ round and about these bodies there would be emptiness until you came to Mars,\ \ a hundred and seventy-five feet beyond the earth; Jupiter nearly a mile away,\ \ a foot in diameter; Saturn, a little smaller, two miles off; Uranus four miles\ \ off and Neptune six miles off. Then nothingness and nothingness except for small\ \ particles and drifting scraps of attenuated vapour for thousands of miles.\n\ \"\"\"\n\nNote that even blunt facts can be suitable for questions, and unconventional\ \ knowledge is not necessarily unsuitable. Fictional stories that contain strong\ \ morals or philosophy can also have good questions made from them. But legal\ \ notices, metadata, and tables of contents are not suitable. Lists of information\ \ without the context needed for the question-maker to understand the text; quotes\ \ or dialogues without context or clear depth; or ambiguous content that isn't\ \ precise enough to \"nail down\" a solid question from, are not valid." role: user
content: "## Reasoning and thought process:\nStep 1. Identify Paragraph Content:\ \ This describes a scale model of the solar system.\nStep 2. Evaluate Educational\ \ Relevance: The paragraph contains analogies and information about the solar\ \ system's scale and distances between celestial bodies.\nStep 3. Assess Specific\ \ Contexts and Formats:\n - Purely Descriptive Lists: While the paragraph lists\ \ celestial bodies and their distances, it does so to illustrate a scientific\ \ concept, providing educational value.\n - Dialogues or Quotes Without Context:\ \ Not applicable.\n - Ambiguous or Unclear Texts: The paragraph is clear and\ \ offers a concrete concept in astronomy.\nStep 4. Assess the Possibility of Formulating\ \ Questions: The content enables the creation of educational questions about astronomy,\ \ scale, and the solar system's structure.\nStep 5. Final Judgment: Suitable for\ \ educational questions." role: assistant
content: 'Text:

"""

In the world of science, there are countless mysteries and phenomena that elude easy explanation. For instance, certain forces and energies interact in ways that are not fully understood, shaping the universe in subtle and profound manners. These interactions often occur at levels beyond human perception, leaving much to speculation and theory. Various scientific disciplines attempt to explain these interactions, each offering unique perspectives but often lacking definitive answers. The vastness of these mysteries spans from the minuscule quantum realm to the expansive cosmos, hinting at complexities that challenge our current understanding.

"""

Note that even blunt facts can be suitable for questions, and unconventional knowledge is not necessarily unsuitable. Fictional stories that contain strong morals or philosophy can also have good questions made from them. But legal notices, metadata, and tables of contents are not suitable. Lists of information without the context needed for the question-maker to understand the text; quotes or dialogues without context or clear depth; or ambiguous content that isn''t precise enough to "nail down" a solid question from, are not valid.' role: user
content: "## Reasoning and thought process:\nStep 1. Identify Paragraph Content:\ \ The paragraph discusses the complexity and mystery of scientific phenomena,\ \ mentioning interactions of forces and energies in the universe.\nStep 2. Evaluate\ \ Educational Relevance: The paragraph touches on scientific themes and makes\ \ a number of broad claims.\nStep 3. Assess Specific Contexts and Formats:\n \ \ - Purely Descriptive Lists: Not applicable.\n - Dialogues or Quotes Without\ \ Context: Not applicable.\n - Ambiguous or Unclear Texts: The text is ambiguous\ \ and lacks clarity. It refers to scientific phenomena and theories without providing\ \ specific examples or concrete information, making it difficult to derive clear\ \ educational questions.\nStep 4. Assess the Possibility of Formulating Questions:\ \ Due to its vagueness and lack of specific content, the paragraph does not lend\ \ itself well to formulating meaningful educational questions.\nStep 5. Final\ \ Judgment: Unsuitable for educational questions due to its ambiguous and unclear\ \ content." role: assistant
content: 'Text:

"""

The Brussels Conference on the subject 95

Illustrations of barbarous reprisals 97

Instances of non-retaliation 98

Savage reprisals in days of chivalry 100

Hanging the commonest reprisals for a brave defence 101

As illustrated by the warfare of the fifteenth century 102

Survival of the custom to our own times 104

The massacre of a conquered garrison still a law of war 105

The shelling of Strasburg by the Germans 106

Brutal warfare of Alexander the Great 107

The connection between bravery and cruelty 110

The abolition of slavery in its effects on war 112

The storming of Magdeburg, Brescia, and Rome 112

Cicero on Roman warfare 114

The reprisals of the Germans in France in 1870 115

Their revival of the custom of taking hostages 117

Their resort to robbery as a plea of reprisals 118

General Von Moltke on perpetual peace 119

The moral responsibility of the military profession 121

"""

Note that even blunt facts can be suitable for questions, and unconventional knowledge is not necessarily unsuitable. Fictional stories that contain strong morals or philosophy can also have good questions made from them. But legal notices, metadata, and tables of contents are not suitable. Lists of information without the context needed for the question-maker to understand the text; quotes or dialogues without context or clear depth; or ambiguous content that isn''t precise enough to "nail down" a solid question from, are not valid.' role: user
content: "## Reasoning and thought process:\nStep 1. Identify Paragraph Content:\ \ The given text lists various war-related events and headings, as well as numbers.\ \ It may be assumed that this is a table of contents.\nStep 2. Evaluate Educational\ \ Relevance: The paragraph simply lists events, without any supporting detail\ \ behind them.\nStep 3. Assess Specific Contexts and Formats:\n - Purely Descriptive\ \ Lists: Applicable. This is a list of sections in a book.\n - Dialogues or Quotes\ \ Without Context: Not applicable.\n - Ambiguous or Unclear Texts: The text is\ \ ambiguous and lacks clarity. It refers to scientific phenomena and theories\ \ without providing specific examples or concrete information, making it difficult\ \ to derive clear educational questions.\nStep 4. Assess the Possibility of Formulating\ \ Questions: Due to its lack of depth, the table of contents does not lend itself\ \ well to formulating meaningful educational questions.\nStep 5. Final Judgment:\ \ Due to being a table of contents, this is not suited for educational questions." role: assistant
content: "Text: \n\"\"\"\n### Modern\[[edit](https://en.wikipedia.org/w/index.php?title=Medicine&action=edit&section=18\ \ \"Edit section: Modern\")\][Paul-Louis\ \ Simond](https://en.wikipedia.org/wiki/Paul-Louis_Simond \"Paul-Louis Simond\"\ ) injecting a plague vaccine in [Karachi](https://en.wikipedia.org/wiki/Karachi\ \ \"Karachi\"), 1898\n\"\"\"\n\nNote that even blunt facts can be suitable for\ \ questions, and unconventional knowledge is not necessarily unsuitable. Fictional\ \ stories that contain strong morals or philosophy can also have good questions\ \ made from them. But legal notices, metadata, and tables of contents are not\ \ suitable. Lists of information without the context needed for the question-maker\ \ to understand the text; quotes or dialogues without context or clear depth;\ \ or ambiguous content that isn't precise enough to \"nail down\" a solid question\ \ from, are not valid." role: user
content: '' role: assistant timeout: false

mjh624 commented 1 day ago

I have also observed an "index out of bounds" error:

Output written to ../outFiles/judge_paragraph_generations/99e9cbce-823a-401e-b09b-ef922e806e98.yaml DEBUG model decided that index 485 was not suitable Converting generations to training data entering saving mode ...Converted successfully (we think) Traceback (most recent call last): File "/tmp/augmentoolkit/processing.py", line 504, in asyncio.run(main()) File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete return future.result() File "/tmp/augmentoolkit/processing.py", line 225, in main print(filtered_worthy_for_questions[0]) IndexError: list index out of range

with ollama

e-p-armstrong / augmentoolkit

List index out of range error #46