As I understand rugpt3xl was trained with deepspeed. As a consequence, unlike other models, it lacks a config.json file, which is necessary to use transformers.GPT2LMHeadModel.from_pretrained(model_name). However, rugpt3xl comes with a deepspeed_config.json file.
from transformers.deepspeed import HfDeepSpeedConfig
from transformers import AutoModel, deepspeed
ds_config = { ... } # deepspeed config object or path to the file
# must run before instantiating the model
dschf = HfDeepSpeedConfig(ds_config) # keep this object alive
model = GPT2LMHeadModel.from_pretrained("sberbank-ai/rugpt3xl") # this throws a can't load config error
engine = deepspeed.initialize(model=model, config_params=ds_config, ...)
We do not support Huggingface GPT2LMHeadModel interface for using rugpt3xl, please use our code:
from src.xl_wrapper import RuGPT3XLgpt = RuGPT3XL.from_pretrained("sberbank-ai/rugpt3xl", seq_len=512)
As I understand rugpt3xl was trained with deepspeed. As a consequence, unlike other models, it lacks a config.json file, which is necessary to use transformers.GPT2LMHeadModel.from_pretrained(model_name). However, rugpt3xl comes with a deepspeed_config.json file.
How does the non-trainer deepspeed integration work when I have no access to a normal config file?