Open kartik122 opened 3 weeks ago
One thing this could be is that if everything fails validation (possibly due to the model breaking with the output format every time, if the model is not very strong, or sometimes due to API errors and other issues) then the chunks list will be empty and nothing will generate past that.
Strangely it looks like you're using the default inputs, so we can probably rule that out. And llama 3 is also definitely capable of running Augmentoolkit, so that rules that out as well unless you're running a very low quant or something's up with your server.
Could you share some of the intermediate outputs? You should be able to find them in outputs/judge_paragraph_generations. In that folder there will be a bunch of yaml files containing the full prompts + the AI output at the very end. It might contain a clue about what's going on.
Hi @e-p-armstrong this is one of the output files from the judge_paragraph_generations folder
content: 'Text:
"""
"Did you hear about the mayor''s decision?" asked John.
"It''s quite surprising, isn''t it?" replied Emily, her voice tinged with disbelief.
"I know, right? But what can we do?" John sighed.
Sarah continued her shopping, her mind now on the mayor''s mysterious decision.
"""
Note that even blunt facts can be suitable for questions, and unconventional knowledge is not necessarily unsuitable. Fictional stories that contain strong morals or philosophy can also have good questions made from them. But legal notices, metadata, and tables of contents are not suitable. Lists of information without the context needed for the question-maker to understand the text; quotes or dialogues without context or clear depth; or ambiguous content that isn''t precise enough to "nail down" a solid question from, are not valid.' role: user
content: 'Text:
"""
In the world of science, there are countless mysteries and phenomena that elude easy explanation. For instance, certain forces and energies interact in ways that are not fully understood, shaping the universe in subtle and profound manners. These interactions often occur at levels beyond human perception, leaving much to speculation and theory. Various scientific disciplines attempt to explain these interactions, each offering unique perspectives but often lacking definitive answers. The vastness of these mysteries spans from the minuscule quantum realm to the expansive cosmos, hinting at complexities that challenge our current understanding.
"""
Note that even blunt facts can be suitable for questions, and unconventional knowledge is not necessarily unsuitable. Fictional stories that contain strong morals or philosophy can also have good questions made from them. But legal notices, metadata, and tables of contents are not suitable. Lists of information without the context needed for the question-maker to understand the text; quotes or dialogues without context or clear depth; or ambiguous content that isn''t precise enough to "nail down" a solid question from, are not valid.' role: user
content: 'Text:
"""
The Brussels Conference on the subject 95
Illustrations of barbarous reprisals 97
Instances of non-retaliation 98
Savage reprisals in days of chivalry 100
Hanging the commonest reprisals for a brave defence 101
As illustrated by the warfare of the fifteenth century 102
Survival of the custom to our own times 104
The massacre of a conquered garrison still a law of war 105
The shelling of Strasburg by the Germans 106
Brutal warfare of Alexander the Great 107
The connection between bravery and cruelty 110
The abolition of slavery in its effects on war 112
The storming of Magdeburg, Brescia, and Rome 112
Cicero on Roman warfare 114
The reprisals of the Germans in France in 1870 115
Their revival of the custom of taking hostages 117
Their resort to robbery as a plea of reprisals 118
General Von Moltke on perpetual peace 119
The moral responsibility of the military profession 121
"""
Note that even blunt facts can be suitable for questions, and unconventional knowledge is not necessarily unsuitable. Fictional stories that contain strong morals or philosophy can also have good questions made from them. But legal notices, metadata, and tables of contents are not suitable. Lists of information without the context needed for the question-maker to understand the text; quotes or dialogues without context or clear depth; or ambiguous content that isn''t precise enough to "nail down" a solid question from, are not valid.' role: user
I have also observed an "index out of bounds" error:
Output written to ../outFiles/judge_paragraph_generations/99e9cbce-823a-401e-b09b-ef922e806e98.yaml
DEBUG model decided that index 485 was not suitable
Converting generations to training data
entering saving mode
...Converted successfully (we think)
Traceback (most recent call last):
File "/tmp/augmentoolkit/processing.py", line 504, in
with ollama
Trying to run the augment toolkit on MacOs M3, with ollama (ollama run llama3) on the following config.yaml PATH: INPUT: "./raw_text_input" OUTPUT: "./output" DEFAULT_PROMPTS: "./prompts" # the baseline prompt folder that Augmentoolkit falls back to if it can't find a step in the PROMPTS path PROMPTS: "./prompts" # Where Augmentoolkit first looks for prompts API: API_KEY: "53212512" BASE_URL: http://127.0.0.1:11434/ LARGE_LOGICAL_MODEL: llama3 LOGICAL_MODEL: llama3 # model used for question generation and conversation generation at the very end. A pretty tough task, if ASSISTANT_MODE isn't on. QUANTIZATION_SMALL: "gptq" # Only use if Aphrodite mode is on. QUANTIZATION_LARGE: "gptq" # Only use if Aphrodite mode is on. SKIP: QUESTION_CHECK: False ANSWER_RELEVANCY_CHECK: False # turn on if using the negative question prompt override FILTER_CHUNKS: False SYSTEM: CHUNK_SIZE: 1900 USE_FILENAMES: False # give the AI context from the filenames provided to it. Useful if the filenames are meaningful, otherwise turn them off. DOUBLE_CHECK_COUNTER: 1 # How many times to check a question and answer pair during each validation step. Majority vote decides if it passes that step. There are three steps. So most questions are by default checked around 9 times (fewer if the first two checks for a step pass, obviously). SUBSET_SIZE: 10 USE_SUBSET: False # Whether to take only the first 13 chunks from a text during the run. Useful for experimenting and iterating and seeing all the steps without costing too much money or time. CONCURRENCY_LIMIT: 50 # Hard limit of how many calls can be run at the same time, useful for API mode (aphrodite automatically manages this and queues things, as far as I know) COMPLETION_MODE: False # Change to false if you want to use chat (instruct) mode; this requires .json files in your chosen prompts directory, in the OpenAI API format. Not all APIs support completion mode. MODE: "api" # can be one of "api"|"aphrodite" STOP: True # True = Use stop tokens, False = do not use stop tokens. OpenAI's API restricts you to four stop tokens and all steps have way more than four stop tokens, so you'll need to turn this to False if you're using OAI's API. Also NOTE that if you turn this OFF while using COMPLETION MODE, EVERYTHING WILL BREAK and it will cost you money in the process. Don't do that. CONVERSATION_INSTRUCTIONS: For this conversation, you are generating a chat between a generalist, generic AI assistant, and a human. FINAL_ASSISTANT_PROMPT_NO_RAG: | You are a helpful AI assistant. FINAL_ASSISTANT_PROMPT_RAG: | You are a helpful AI assistant.
Context information is below:
{data} PHASE: WORK_IN_PHASES: False PHASE_INDEX: 3 # index of the phase we are currently on (index 0 = filtering out chunks with no relevant context; index 1 = question generation; index 2 = question validation; index 3 = context revision and conversation generation, the final phase) HUGGINGFACE: HUB_PATH: "Heralax/test-atk-dataset-do-not-use-3" PRIVATE: false PUSH_TO_HUB: false
im getting the error as follows : LOADING: failed|./raw_text_input/medicine_wikipedia 100%|█████████████████████████████████████████| 85/85 [00:00<00:00, 5419.66it/s] Converting generations to training data entering saving mode ...Converted successfully (we think) Traceback (most recent call last): File "/Users/kargupta8/Desktop/augmentoolkit/processing.py", line 505, in
asyncio.run(main())
File "/Users/kargupta8/miniconda3/envs/augment-toolkit/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/Users/kargupta8/miniconda3/envs/augment-toolkit/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kargupta8/miniconda3/envs/augment-toolkit/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/kargupta8/Desktop/augmentoolkit/processing.py", line 226, in main
print(filtered_worthy_for_questions[0])