ai-forever / ru-gpts

Russian GPT3 models.
Apache License 2.0
2.08k stars 444 forks source link

Dataset collaboration #19

Closed mgrankin closed 3 years ago

mgrankin commented 3 years ago

Hello,

Thank you for the high quality pre-trained model, it's super easy to deploy and use.

As you may know there is an ongoing community driven effort to replicate GPT3 with 175B parameters. A part of the project is building the dataset. The version 1 of the dataset is focused on English and is almost ready. The next goal is a fully-multilingual, 10TiB text dataset.

https://github.com/EleutherAI/The-Pile

Would you mind sharing yours dataset, so it can be part of the project?

king-menin commented 3 years ago

Hello! We apologize for the long answer. Our team discussed the posibilities and restrictions of data rights and license agreements. We are not ready to publish our data yet, but it would be cool to communicate in person and discuss ideas and datasets, for instance, in zoom meeting. Please write to login-const@mail.ru to clarify the details. Thanks! Our team will be happy to collaborate with you ☺️

mgrankin commented 3 years ago

Hello and thank you for your reply. Looking forward to collaborating with your team!