Oneflow-Inc / libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
https://libai.readthedocs.io
Apache License 2.0
389 stars 55 forks source link

feat: support third-party oneflow device extension #549

Closed 0x404 closed 1 week ago

0x404 commented 1 week ago

Changes of this PR made:

  1. In _init_distributed_env, import oneflow’s third-party library (i.e., oneflow-npu and oneflow-xpu) based on the actual device in use, because _DistributeUtil is only called once, thus we will import the required third-party library only once.
  2. Slight modifications to the initialization logic of BasePipeline: if the user provides the model_path parameter, then BasePipeline will load the model based on this parameter (which has a higher priority than the model configuration in config’s model.cfg.pretrained_model_path). If this parameter is not provided, the default will be to use model.cfg.pretrained_model_path.
  3. If the tokenization.tokenizer in the config does not have a pretrained_model_path set, the default will be to use the file tokenizer.model under model.cfg.pretrained_model_path (this is usually correct, as it is the default storage location and naming used by hugging face).