In _init_distributed_env, import oneflow’s third-party library (i.e., oneflow-npu and oneflow-xpu) based on the actual device in use, because _DistributeUtil is only called once, thus we will import the required third-party library only once.
Slight modifications to the initialization logic of BasePipeline: if the user provides the model_path parameter, then BasePipeline will load the model based on this parameter (which has a higher priority than the model configuration in config’s model.cfg.pretrained_model_path). If this parameter is not provided, the default will be to use model.cfg.pretrained_model_path.
If the tokenization.tokenizer in the config does not have a pretrained_model_path set, the default will be to use the file tokenizer.model under model.cfg.pretrained_model_path (this is usually correct, as it is the default storage location and naming used by hugging face).
Changes of this PR made:
_init_distributed_env
, import oneflow’s third-party library (i.e., oneflow-npu and oneflow-xpu) based on the actual device in use, because _DistributeUtil is only called once, thus we will import the required third-party library only once.BasePipeline
: if the user provides themodel_path
parameter, thenBasePipeline
will load the model based on this parameter (which has a higher priority than the model configuration in config’smodel.cfg.pretrained_model_path
). If this parameter is not provided, the default will be to usemodel.cfg.pretrained_model_path
.tokenization.tokenizer
in the config does not have apretrained_model_path
set, the default will be to use the filetokenizer.model
undermodel.cfg.pretrained_model_path
(this is usually correct, as it is the default storage location and naming used by hugging face).