ai-forever / ru-gpts

Russian GPT3 models.
Apache License 2.0
2.08k stars 445 forks source link

How to create .list file? #90

Closed dinosaurtirex closed 2 years ago

dinosaurtirex commented 2 years ago

Hello! Can you please explain how i should create .list file? Don't understand that moment

TatianaShavrina commented 2 years ago

Hey @sneakybeaky18,

the list file is basically a text file with the file paths to your data. The data should be split into equal shards in order to run distributed learning. For example:


data = open('/home/jovyan/data/all_data.txt', 'r').read().split('\n')
batch_size = len(data) // num_gpus

with open("/home/jovyan/data/quests/final/train.list", "w") as file:
    idx = 0
    while data:
        with open(f"/home/jovyan/data/train{idx}.txt", "w") as file_t:
            for line in data[:batch_size]:
                file_t.write(f"{line}\n")
        file.write(f"/home/jovyan/data/train{idx}.txt\n")
        idx += 1
        data = data[batch_size:]```

After you can use that as shown in the example [Colab](https://colab.research.google.com/github/ai-forever/ru-gpts/blob/master/examples/ruGPT3XL_finetune_example.ipynb)
dinosaurtirex commented 2 years ago

Thanks to your team for great open source project, and thanks for the answer