ai-forever / ru-gpts

Russian GPT3 models.
Apache License 2.0
2.08k stars 445 forks source link

How to use Huggingface transformers.GPT2LMHeadModel.from_pretrained(model_name) on rugpt3xl? #82

Closed lauberto closed 2 years ago

lauberto commented 2 years ago

As I understand rugpt3xl was trained with deepspeed. As a consequence, unlike other models, it lacks a config.json file, which is necessary to use transformers.GPT2LMHeadModel.from_pretrained(model_name). However, rugpt3xl comes with a deepspeed_config.json file.

How does the non-trainer deepspeed integration work when I have no access to a normal config file?

from transformers.deepspeed import HfDeepSpeedConfig
from transformers import AutoModel, deepspeed

ds_config = { ... } # deepspeed config object or path to the file
# must run before instantiating the model
dschf = HfDeepSpeedConfig(ds_config) # keep this object alive
model = GPT2LMHeadModel.from_pretrained("sberbank-ai/rugpt3xl") # this throws a can't load config error
engine = deepspeed.initialize(model=model, config_params=ds_config, ...) 
king-menin commented 2 years ago

We do not support Huggingface GPT2LMHeadModel interface for using rugpt3xl, please use our code: from src.xl_wrapper import RuGPT3XL gpt = RuGPT3XL.from_pretrained("sberbank-ai/rugpt3xl", seq_len=512)