ai-forever / ru-gpts

Russian GPT3 models.
Apache License 2.0
2.08k stars 444 forks source link

num_samples error? #15

Closed fen0s closed 3 years ago

fen0s commented 3 years ago

I'm trying to finetune GPT-2 Large and I get that error:

ValueError: num_samples should be a positive integer value, but got num_samples=0

What is that? Googling says that dataset is missing, but i've checked twice, and the path is absolute correct. Tried with both absolute and non-absolute path.

king-menin commented 3 years ago

What script do you use? in transformers we can get this error if block_size > len(tokens_in_data).

king-menin commented 3 years ago

if you use our script pretrain_transformers.py try to remove this line: https://github.com/sberbank-ai/ru-gpts/blob/8e7c92b44610756cbb99acaf86a9e8344b9a44c0/pretrain_transformers.py#L56 and check block_size below. also check tokenizer.max_len==1024 and see line: https://github.com/sberbank-ai/ru-gpts/blob/8e7c92b44610756cbb99acaf86a9e8344b9a44c0/pretrain_transformers.py#L698 you can try to remove.

fen0s commented 3 years ago

It worked for resolving num_samples. Colab doesn't have enough memory to finetune it though... Thanks for help anyways!