e-p-armstrong / augmentoolkit

Convert Compute And Books Into Instruct-Tuning Datasets
MIT License
560 stars 77 forks source link

Empty dataset generated #24

Open worstkid92 opened 1 week ago

worstkid92 commented 1 week ago

Env: augmenttoolkit main branch Model:llama3:8b/Mistral Ollama

Config.yaml:

PATH:
  INPUT: "./os_txt"
  OUTPUT: "./os_txt"
  DEFAULT_PROMPTS: "./prompts" # the baseline prompt folder that Augmentoolkit falls back to if it can't find a step in the PROMPTS path
  PROMPTS: "./prompts" # Where Augmentoolkit first looks for prompts
API:
  API_KEY: "ollama" # Add the API key for your favorite provider here
  BASE_URL: "http://127.0.0.1:11434/v1/" # add the base url for a provider, or local server, here. Some possible values:  http://127.0.0.1:5000/v1/ # <- local models. # https://api.together.xyz # <- together.ai, which is real cheap, real flexible, and real high-quality, if a tad unreliable. # https://api.openai.com/v1/ # <- OpenAI. Will bankrupt you very fast. # anything else that accepts OAI-style requests, so basically any API out there (openrouter, fireworks, etc etc etc...)
  LOGICAL_MODEL: "llama3:8b" # model used for everything except conversation generation at the very end
  LARGE_LOGICAL_MODEL: "llama3:8b" # model used for conversation generation at the very end. A pretty tough task, if ASSISTANT_MODE isn't on.

Only the example file Simple Sabotage, by the Office of Strategic Services, published 1944.txt is under the dir os_txt Folder not empty in both check_answer_relevancy_generations and question_generation_generations

Traceback tail:

2024-06-26 14:04:35,860 - INFO - HTTP Request: POST http://127.0.0.1:11434/v1/chat/completions "HTTP/1.1 200 OK"
Output written to ./os_txt/multiturn_conversation_generations/4e23d61d-5b21-4a3f-b828-187dc3f34589.txt
Conversation is too short! Validation failed!
[]
 73%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                      | 11/15 [00:31<00:10,  2.69s/it]2024-06-26 14:04:36,421 - INFO - HTTP Request: POST http://127.0.0.1:11434/v1/chat/completions "HTTP/1.1 200 OK"
Output written to ./os_txt/multiturn_conversation_generations/1d404ba3-9428-4029-8f44-a8b2c728b0eb.txt
Conversation is too short! Validation failed!
[]
 80%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                        | 12/15 [00:32<00:06,  2.15s/it]2024-06-26 14:04:37,371 - INFO - HTTP Request: POST http://127.0.0.1:11434/v1/chat/completions "HTTP/1.1 200 OK"
Output written to ./os_txt/multiturn_conversation_generations/33aefbf9-d656-4999-9e26-5e7a00abf18a.txt
Conversation is too short! Validation failed!
[]
 87%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                           | 13/15 [00:33<00:03,  1.73s/it]2024-06-26 14:04:38,165 - INFO - HTTP Request: POST http://127.0.0.1:11434/v1/chat/completions "HTTP/1.1 200 OK"
Output written to ./os_txt/multiturn_conversation_generations/710843bc-1638-4bee-94af-256649dc1343.txt
Conversation is too short! Validation failed!
[]
 93%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍             | 14/15 [00:33<00:01,  1.45s/it]2024-06-26 14:04:39,011 - INFO - HTTP Request: POST http://127.0.0.1:11434/v1/chat/completions "HTTP/1.1 200 OK"
Output written to ./os_txt/multiturn_conversation_generations/2eb7d60a-9599-4896-940c-d02bd5cdacf0.txt
Conversation is too short! Validation failed!
[]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:39<00:00,  2.64s/it`

Can anyone help?

worstkid92 commented 1 week ago

image also a lot of generate question failure

e-p-armstrong commented 6 days ago

It's interesting that you've mentioned the example file is Simple Sabotage -- that example file isn't the one used in the most recent versions of the repo. Are you sure you're running the most recent version of Augmentoolkit? The example inputs there are a part of a wikipedia article and the etherium whitepaper.

worstkid92 commented 5 days ago

It's interesting that you've mentioned the example file is Simple Sabotage -- that example file isn't the one used in the most recent versions of the repo. Are you sure you're running the most recent version of Augmentoolkit? The example inputs there are a part of a wikipedia article and the etherium whitepaper.

It's interesting that you've mentioned the example file is Simple Sabotage -- that example file isn't the one used in the most recent versions of the repo. Are you sure you're running the most recent version of Augmentoolkit? The example inputs there are a part of a wikipedia article and the etherium whitepaper.

Thanks.I tried to skip the two checks and generate questions and output. Thanks for your reply.