How to use Huggingface transformers.GPT2LMHeadModel.from_pretrained(model_name) on rugpt3xl?

As I understand rugpt3xl was trained with deepspeed. As a consequence, unlike other models, it lacks a config.json file, which is necessary to use transformers.GPT2LMHeadModel.from_pretrained(model_name). However, rugpt3xl comes with a deepspeed_config.json file.

How does the non-trainer deepspeed integration work when I have no access to a normal config file?

from transformers.deepspeed import HfDeepSpeedConfig
from transformers import AutoModel, deepspeed

ds_config = { ... } # deepspeed config object or path to the file
# must run before instantiating the model
dschf = HfDeepSpeedConfig(ds_config) # keep this object alive
model = GPT2LMHeadModel.from_pretrained("sberbank-ai/rugpt3xl") # this throws a can't load config error
engine = deepspeed.initialize(model=model, config_params=ds_config, ...)

ai-forever / ru-gpts

How to use Huggingface transformers.GPT2LMHeadModel.from_pretrained(model_name) on rugpt3xl? #82