IlyaGusev / rulm

Language modeling and instruction tuning for Russian
Apache License 2.0
442 stars 51 forks source link

How to use generate_instructions.py script? #43

Open beybars1 opened 2 months ago

beybars1 commented 2 months ago

Hey Ilya, much appreciate for you and your team's effort to open source contribution.

I have following question regarding the generate_instructions.py file. How can I run it and use it? Which arguments to pass in? Is there any example how to generate instruction dataset using this script?

IlyaGusev commented 2 months ago

Hey, thanks :)

The list of parameters is here

Run it from rulm/self_instruct:

python3 -m src.data_processing.generate_instructions \
  --output-path output.jsonl \
  --seed-tasks-path data/ru_alpaca_seed_tasks.jsonl \
  --settings-path external_prompts/ru_gen_settings.json \
  --template-path external_prompts/ru_instruct.txt

The script is ancient, though, so many things can go wrong. The most obvious one is that you will need an old version of OpenAI wrapper.