首先，感谢上海人工智能实验室及其成员对书生模型、代码框架、技术经验的分享！

用的环境安装流程是按照这个上面安装的：https://github.com/InternLM/xtuner/issues/447#issue-2170980022

运行命令：NPROC_PER_NODE=8 xtuner train internlm2_20b_qlora_msagent_react_e3_gpu8.py --deepspeed deepspeed_zero3

用的模型和数据：

Model

pretrained_model_name_or_path = '/apply/model/original/internlm2-20b' use_varlen_attn = False

Data

data_path = 'damo/MSAgent-Bench'

完整报错日志： 311pt.log

报错日志部分信息：

03/12 13:57:29 - mmengine - WARNING - Dataset Dataset has no metainfo. dataset_meta in visualizer will be None. quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'> low_cpu_mem_usage was None, now set to True since model is quantized. quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'> low_cpu_mem_usage was None, now set to True since model is quantized. quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'> low_cpu_mem_usage was None, now set to True since model is quantized. quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'> low_cpu_mem_usage was None, now set to True since model is quantized. quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'> quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'> low_cpu_mem_usage was None, now set to True since model is quantized. low_cpu_mem_usage was None, now set to True since model is quantized. quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'> low_cpu_mem_usage was None, now set to True since model is quantized. quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'> low_cpu_mem_usage was None, now set to True since model is quantized. [2024-03-12 13:57:34,036] [INFO] [partition_parameters.py:343:exit] finished initializing model - num_params = 339, num_elems = 19.86B Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s] Traceback (most recent call last): File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/xtuner/tools/train.py", line 307, in main() File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/xtuner/tools/train.py", line 303, in main runner.train() File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1182, in train self.strategy.prepare( File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/_strategy/deepspeed.py", line 381, in prepare model = self.build_model(model) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/_strategy/base.py", line 306, in build_model model = MODELS.build(model) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, args, kwargs, registry=self) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 232, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/xtuner/model/sft.py", line 27, in init self.llm = self._build_from_cfg_or_module(llm) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/xtuner/model/sft.py", line 91, in _build_from_cfg_or_module return BUILDER.build(cfg_or_mod) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, args, kwargs, registry=self) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 556, in from_pretrained return model_class.from_pretrained( File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3502, in from_pretrained ) = cls._load_pretrained_model( File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3926, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 345, in set_module_tensor_to_device raise ValueError( ValueError: Trying to set a tensor of shape torch.Size([92544, 6144]) in "weight" (which has shape torch.Size([0])), this look incorrect.

请问这个问题应该如何解决呢？

谢谢！

运行命令：NPROC_PER_NODE=8 xtuner train internlm2_20b_qlora_msagent_react_e3_gpu8.py --deepspeed deepspeed_zero3

zero3换成zero2可以正常运行了，具体是一下命令：

运行命令：NPROC_PER_NODE=8 xtuner train internlm2_20b_qlora_msagent_react_e3_gpu8.py --deepspeed deepspeed_zero2

@sxk000 是的！ QLoRA 与 zero3 暂不兼容，可以考虑 QLoRA -> LoRA，或 zero3 -> zero2。

internlm2_20b_qlora_msagent_react_e3_gpu8

上面是qlora训练方式，如果全参训练的话，

1，需要对脚本做一下改动：

#######################################################################
#                      PART 2  Model & Tokenizer                      #
#######################################################################
tokenizer = dict(
    type=AutoTokenizer.from_pretrained,
    pretrained_model_name_or_path=pretrained_model_name_or_path,
    trust_remote_code=True,
    padding_side='right')

# model = dict(
#     type=SupervisedFinetune,
#     llm=dict(
#         type=AutoModelForCausalLM.from_pretrained,
#         pretrained_model_name_or_path=pretrained_model_name_or_path,
#         trust_remote_code=True,
#         torch_dtype=torch.float16,
#         quantization_config=dict(
#             type=BitsAndBytesConfig,
#             load_in_4bit=True,
#             load_in_8bit=False,
#             llm_int8_threshold=6.0,
#             llm_int8_has_fp16_weight=False,
#             bnb_4bit_compute_dtype=torch.float16,
#             bnb_4bit_use_double_quant=True,
#             bnb_4bit_quant_type='nf4')),
#     lora=dict(
#         type=LoraConfig,
#         r=64,
#         lora_alpha=16,
#         lora_dropout=0.1,
#         bias='none',
#         task_type='CAUSAL_LM'))

model = dict(
    type=SupervisedFinetune,
    llm=dict(
        type=AutoModelForCausalLM.from_pretrained,
        pretrained_model_name_or_path=pretrained_model_name_or_path,
        trust_remote_code=True,
        torch_dtype=torch.float16))

2，运行命令：NPROC_PER_NODE=8 xtuner train internlm2_20b_qlora_msagent_react_e3_gpu8.py --deepspeed deepspeed_zero3

是可以正常运行的

@sxk000 是的！ QLoRA 与 zero3 暂不兼容，可以考虑 QLoRA -> LoRA，或 zero3 -> zero2。

好的，谢谢！

@LZHgrla 你好！

如果把样例数据下载到本地文件夹后，加载训练，报错

# Data
# data_path = 'damo/MSAgent-Bench'
data_path = '/apply/data/finetune/MSAgent-Bench'

运行命令：NPROC_PER_NODE=8 xtuner train internlm2_20b_qlora_msagent_react_e3_gpu8.py --deepspeed deepspeed_zero3

报错完整日志：报错日志313.log

报错日志部分信息：

2024-03-13 09:38:48,013 - modelscope - INFO - PyTorch version 2.2.1 Found.
2024-03-13 09:38:48,013 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-03-13 09:38:48,038 - modelscope - INFO - Loading done! Current index file version is 1.13.0, with md5 53769789b84c28d4871ea93addb53e8a and a total number of 972 components indexed
03/13 09:38:48 - mmengine - INFO - xtuner_dataset_timeout = 0:30:00
Map (num_proc=32):   0%|                                                                                                    | 0/598185 [00:00<?, ? examples/s]
multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 623, in _write_generator_to_queue
    for i, result in enumerate(func(**kwargs)):
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3458, in _map_single
    example = apply_function_on_filtered_inputs(example, i, offset=offset)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3361, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/xtuner/dataset/map_fns/dataset_map_fns/msagent_map_fn.py", line 54, in msagent_react_map_fn
    text = eval(example['conversations'])
TypeError: eval() arg 1 must be a string, bytes or code object
"""

The above exception was the direct cause of the following exception:

请问这个问题应该如何解决呢？

谢谢！

@sxk000 Hi

该 PR 将修复这一问题

https://github.com/InternLM/xtuner/pull/470

@sxk000 Hi

该 PR 将修复这一问题

470

你好！

按照上面写的修复代码修改后，报如下错误：

after_run:
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
2024-03-13 14:10:37,942 - modelscope - INFO - PyTorch version 2.2.1 Found.
2024-03-13 14:10:37,942 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-03-13 14:10:38,078 - modelscope - INFO - Loading done! Current index file version is 1.13.0, with md5 53769789b84c28d4871ea93addb53e8a and a total number of 972 components indexed
03/13 14:10:38 - mmengine - INFO - xtuner_dataset_timeout = 0:30:00
Map (num_proc=32):   2%|█▎                                                                                     | 9450/598185 [00:01<01:31, 6446.07 examples/s]
multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 623, in _write_generator_to_queue
    for i, result in enumerate(func(**kwargs)):
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3458, in _map_single
    example = apply_function_on_filtered_inputs(example, i, offset=offset)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3361, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/xtuner/dataset/map_fns/dataset_map_fns/msagent_map_fn.py", line 67, in msagent_react_map_fn
    api_dict[obj['name']] = obj['description']
TypeError: unhashable type: 'list'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

整个报错日志文件：报错日志313.log

应该怎么解决呢？

稍等我检查一下 @sxk000

@sxk000 Hi 该 PR 将修复这一问题

470

你好！

按照上面写的修复代码修改后，报如下错误：

after_run:
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
2024-03-13 14:10:37,942 - modelscope - INFO - PyTorch version 2.2.1 Found.
2024-03-13 14:10:37,942 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-03-13 14:10:38,078 - modelscope - INFO - Loading done! Current index file version is 1.13.0, with md5 53769789b84c28d4871ea93addb53e8a and a total number of 972 components indexed
03/13 14:10:38 - mmengine - INFO - xtuner_dataset_timeout = 0:30:00
Map (num_proc=32):   2%|█▎                                                                                     | 9450/598185 [00:01<01:31, 6446.07 examples/s]
multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 623, in _write_generator_to_queue
    for i, result in enumerate(func(**kwargs)):
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3458, in _map_single
    example = apply_function_on_filtered_inputs(example, i, offset=offset)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3361, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/xtuner/dataset/map_fns/dataset_map_fns/msagent_map_fn.py", line 67, in msagent_react_map_fn
    api_dict[obj['name']] = obj['description']
TypeError: unhashable type: 'list'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

整个报错日志文件：报错日志313.log

应该怎么解决呢？

之前的一个PR对脏数据做了额外的处理，进而导致了这个错误。

我在 https://github.com/InternLM/xtuner/pull/470 中更新了代码，应该可以解决这个问题

@LZHgrla 你好！

按照新更新后的代码可以跑通脚本了！

然后替换自己整理的数据跑脚本时报错：

自己整理的部分训练数据（10条）： train.json

完整的报错日志：报错日志313.log

报错日志部分信息如下：

return cls(*args, begin=begin, end=end, by_epoch=by_epoch, **kwargs)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 787, in build_iter_from_epoch
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/optim/scheduler/lr_scheduler.py", line 20, in __init__
    return cls(*args, begin=begin, end=end, by_epoch=by_epoch, **kwargs)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/optim/scheduler/lr_scheduler.py", line 20, in __init__
    PARAM_SCHEDULERS.build(
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
    super().__init__(optimizer, 'lr', *args, **kwargs)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 759, in __init__
    super().__init__(optimizer, 'lr', *args, **kwargs)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 759, in __init__
    return cls(*args, begin=begin, end=end, by_epoch=by_epoch, **kwargs)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/optim/scheduler/lr_scheduler.py", line 20, in __init__
    return cls(*args, begin=begin, end=end, by_epoch=by_epoch, **kwargs)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/optim/scheduler/lr_scheduler.py", line 20, in __init__
    super().__init__(optimizer, 'lr', *args, **kwargs)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 759, in __init__
        return cls(*args, begin=begin, end=end, by_epoch=by_epoch, **kwargs)    return self.build_func(cfg, *args, **kwargs, registry=self)
super().__init__(optimizer, 'lr', *args, **kwargs)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/optim/scheduler/lr_scheduler.py", line 20, in __init__

  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 294, in build_scheduler_from_cfg
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 759, in __init__
    super().__init__(
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 68, in __init__
    super().__init__(
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 68, in __init__
    super().__init__(optimizer, 'lr', *args, **kwargs)
  File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 759, in __init__
    raise ValueError('end should be larger than begin, but got'
    ValueErrorraise ValueError('end should be larger than begin, but got': 
end should be larger than begin, but got begin=0, end=0    
ValueErrorreturn scheduler_cls.build_iter_from_epoch(  # type: ignore: 
end should be larger than begin, but got begin=0, end=0

这个应该如何解决呢？

非常感谢！

@sxk000 数据太少，不够warmup的iter数，所以报了这个错误。扩增数据可以解决，或者参考下面的代码关闭warmup

param_scheduler = [
-   dict(
-       type=LinearLR,
-       start_factor=1e-5,
-       by_epoch=True,
-       begin=0,
-       end=warmup_ratio * max_epochs,
-       convert_to_iter_based=True),
    dict(
        type=CosineAnnealingLR,
        eta_min=0.0,
        by_epoch=True,
-       begin=warmup_ratio * max_epochs,
+       begin=0,
        end=max_epochs,
        convert_to_iter_based=True)
]

@sxk000 数据太少，不够warmup的iter数，所以报了这个错误。扩增数据可以解决，或者参考下面的代码关闭warmup

param_scheduler = [
-   dict(
-       type=LinearLR,
-       start_factor=1e-5,
-       by_epoch=True,
-       begin=0,
-       end=warmup_ratio * max_epochs,
-       convert_to_iter_based=True),
    dict(
        type=CosineAnnealingLR,
        eta_min=0.0,
        by_epoch=True,
-       begin=warmup_ratio * max_epochs,
+       begin=0,
        end=max_epochs,
        convert_to_iter_based=True)
]

按照上面的代码修改，现在可以跑通了！再次感谢耐心解答！

@LZHgrla

你好！

我用自己新整理的数据跑脚本时，报新的错误：KeyError: 'Column length not in the dataset. Current columns in the dataset: []'

自己整理的数据样例：数据样例.json

报错日志：报错日志.log

报错日志部分信息： Traceback (most recent call last): File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/xtuner/tools/train.py", line 307, in main() File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/xtuner/tools/train.py", line 303, in main runner.train() File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1160, in train self._train_loop = self.build_train_loop( File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 958, in build_train_loop loop = LOOPS.build( File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, args, kwargs, registry=self) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/xtuner/engine/runner/loops.py", line 32, in init dataloader = runner.build_dataloader( File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 824, in build_dataloader dataset = DATASETS.build(dataset_cfg) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, args, kwargs, registry=self) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg obj = obj_cls(args) # type: ignore File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/xtuner/dataset/modelscope.py", line 16, in process_ms_dataset return process_hf_dataset(dataset, *args, *kwargs) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/xtuner/dataset/huggingface.py", line 235, in process_hf_dataset dataset = process(args, **kwargs) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/xtuner/dataset/huggingface.py", line 218, in process setattr(dataset, 'length', dataset['length']) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2810, in getitem return self._getitem(key) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2794, in _getitem pa_subtable = query_table(self._data, key, indices=self._indices) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 580, in query_table _check_valid_column_key(key, table.column_names) File "/root/miniconda3/envs/p310xtuner3/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 520, in _check_valid_column_key raise KeyError(f"Column {key} not in the dataset. Current columns in the dataset: {columns}") KeyError: 'Column length not in the dataset. Current columns in the dataset: []'

这个应该如何解决呢？

非常感谢！

@sxk000 可能是数据太长了，超过了max_length的限制，然后数据都被剔除了，得把config里的max_length调大一些

max_length

这个调整了：max_length = 8192

整理的数据样本长度平均在3000个字符左右，没有超出限制

@sxk000

https://github.com/InternLM/xtuner/blob/a115e55e837b806708b1eb3ebb774fa5f943cc5d/xtuner/dataset/huggingface.py#L177-L218

追踪一下这段代码，看何时数据集被清空了

你好

    print('---------- 1 ----------')
    # Extract the useful data for training from the original dataset.
    if dataset_map_fn is not None:
        print('---------- 2 ----------')
        dataset = map_dataset(dataset, dataset_map_fn, map_num_proc)
        print('---------- 3 ----------')
    print('---------- 4 ----------')
    # Add prompt template, such as <|System|>: xxx <|User|>: xxx <|Bot|>: xxx
    if template_map_fn is not None:
        print('---------- 5 ----------')
        dataset = add_template_to_dataset(dataset, template_map_fn,
                                          map_num_proc)
        print('---------- 6 ----------')
    print('---------- 7 ----------')
    for old, new in rename_maps:
        print('---------- 8 ----------')
        dataset = dataset.rename_column(old, new)
        print('---------- 9 ----------')
    print('---------- 10 ----------')

8和9没有走，其他的都有走到了

@sxk000 可以打印一下print(dataset)看一下

@sxk000 可以打印一下print(dataset)看一下

下面是代码跟踪地方：

    print('---------- 1 ----------')
    print(dataset)
    # Extract the useful data for training from the original dataset.
    if dataset_map_fn is not None:
        print('---------- 2 ----------')
        dataset = map_dataset(dataset, dataset_map_fn, map_num_proc)
        print(dataset)
        print('---------- 3 ----------')
    print('---------- 4 ----------')
    # Add prompt template, such as <|System|>: xxx <|User|>: xxx <|Bot|>: xxx
    if template_map_fn is not None:
        print('---------- 5 ----------')
        dataset = add_template_to_dataset(dataset, template_map_fn,
                                          map_num_proc)
        print(dataset)
        print('---------- 6 ----------')
    print('---------- 7 ----------')
    for old, new in rename_maps:
        print('---------- 8 ----------')
        dataset = dataset.rename_column(old, new)
        print(dataset)
        print('---------- 9 ----------')
    print(dataset)
    print('---------- 10 ----------')

一下是输出的情况：

---------- 1 ----------
Dataset({
    features: ['conversations'],
    num_rows: 100
})
---------- 2 ----------
Dataset({
    features: ['conversations', 'conversation'],
    num_rows: 100
})
---------- 3 ----------
---------- 4 ----------
---------- 5 ----------
Dataset({
    features: ['conversations', 'conversation'],
    num_rows: 0
})
---------- 6 ----------
---------- 7 ----------
Dataset({
    features: ['conversations', 'conversation'],
    num_rows: 0
})
---------- 10 ----------

https://github.com/InternLM/xtuner/blob/a115e55e837b806708b1eb3ebb774fa5f943cc5d/xtuner/dataset/huggingface.py#L54-L64

应该就是 add_template_to_dataset 的 L61 把数据都过滤掉了，请检查一下为何所有数据都是 len(example['conversation']) > 0

https://github.com/InternLM/xtuner/blob/a115e55e837b806708b1eb3ebb774fa5f943cc5d/xtuner/dataset/huggingface.py#L54-L64

应该就是 add_template_to_dataset 的 L61 把数据都过滤掉了，请检查一下为何所有数据都是 len(example['conversation']) > 0

就是这个地方屏蔽掉了，如果用msagent-bench样例数据是可以正常的，如果用自己整理的数据就报错，应该是cot语料整理的有问题，你们有数据格式模版吗？或能帮忙看一下我们的数据哪里有问题吗？非常感谢！

用msagent-bench样例数据：一条msagent-bench样例数据.json

一条msagent-bench正常运行日志：一条msagent-bench正常日志.log

自己整理的样例数据：一条自己整理的数据.json

自己整理的数据运行报错日志：一条自己整理的数据报错日志.log

自己整理的数据运行报错日志部分信息：

num_proc must be <= 1. Reducing num_proc to 1 for dataset of size 1.
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 208.43 examples/s]
num_proc must be <= 1. Reducing num_proc to 1 for dataset of size 1.
Filter: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 417.43 examples/s]
Dataset({
    features: ['conversations', 'conversation'],
    num_rows: 0
})
Traceback (most recent call last):
  File "/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/xtuner/tools/train.py", line 307, in <module>
    main()
  File "/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/xtuner/tools/train.py", line 303, in main
    runner.train()
  File "/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1160, in train
    self._train_loop = self.build_train_loop(
  File "/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 958, in build_train_loop
    loop = LOOPS.build(
  File "/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/xtuner/engine/runner/loops.py", line 32, in __init__
    dataloader = runner.build_dataloader(
  File "/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 824, in build_dataloader
    dataset = DATASETS.build(dataset_cfg)
  File "/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
    return self.build_func(cfg, *args, **kwargs, registry=self)
  File "/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
    obj = obj_cls(**args)  # type: ignore
  File "/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/xtuner/dataset/modelscope.py", line 16, in process_ms_dataset
    return process_hf_dataset(dataset, *args, **kwargs)
  File "/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/xtuner/dataset/huggingface.py", line 330, in process_hf_dataset
    dataset = process(**kwargs)
  File "/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/xtuner/dataset/huggingface.py", line 201, in process
    print(dataset['conversation'][0])
IndexError: list index out of range
[rank1]:[E ProcessGroupGloo.cpp:144] Rank 1 successfully reached monitoredBarrier, but received errors while waiting for send/recv from rank 0. Please check rank 0 logs for faulty rank.

谢谢！

@sxk000 数据格式就是按照msagent-bench进行设计就可以。

应该是自定义数据在 msagent_map_fn 处理的时候，触发了某些报错，导致 return {'conversation': []}

仔细跟踪一下这行代码的执行吧。 https://github.com/InternLM/xtuner/blob/a115e55e837b806708b1eb3ebb774fa5f943cc5d/xtuner/dataset/huggingface.py#L50

可以将num_proc设为1，以方便debug

@sxk000 数据格式就是按照msagent-bench进行设计就可以。

应该是自定义数据在 msagent_map_fn 处理的时候，触发了某些报错，导致 return {'conversation': []}

仔细跟踪一下这行代码的执行吧。

https://github.com/InternLM/xtuner/blob/a115e55e837b806708b1eb3ebb774fa5f943cc5d/xtuner/dataset/huggingface.py#L50

可以将num_proc设为1，以方便debug

这个msagent_react微调脚本用自己整理的数据调通了，是数据格式的问题，下面是具体的脚本、数据样例、运行命令：

脚本（txt改为py）： internlm2_20b_qlora_msagent_react_e3_gpu8.txt

自己整理的一条数据样例： sft.json

运行命令：NPROC_PER_NODE=8 xtuner train internlm2_20b_qlora_msagent_react_e3_gpu8.py --deepspeed deepspeed_zero3

感谢！

你好！

我最今天在用cot格式的语料微调模型时，训练完成也不打印loss过程日志了，也不报错，请问是什么问题呢？

全参微调运行命令：NPROC_PER_NODE=8 xtuner train internlm2_20b_qlora_msagent_react_e3_gpu8.py --deepspeed deepspeed_zero3

训练参数：

use_varlen_attn = False
prompt_template = PROMPT_TEMPLATE.default
max_length = 8192
pack_to_max_length = False

# Scheduler & Optimizer
batch_size = 4  # per_device
accumulative_counts = 1
dataloader_num_workers = 2
max_epochs = 10
optim_type = AdamW
lr = 2e-5
betas = (0.9, 0.999)
weight_decay = 0
max_norm = 1  # grad clip
warmup_ratio = 0.03

# Save
save_steps = 500
save_total_limit = 1  # Maximum checkpoints to keep (-1 means unlimited)

# Evaluate the generation performance during the training
evaluation_freq = 500

自己整理的一条cot语料样例： COT语料样例.json

训练完整的日志： cot训练日志.txt

部分日志：

NPROC_PER_NODE=8 xtuner train internlm2_20b_qlora_msagent_react_e3_gpu8.py --deepspeed deepspeed_zero3 > log_sft/425sft-329pt-.log
[2024-04-25 14:52:22,496] torch.distributed.run: [WARNING]
[2024-04-25 14:52:22,496] torch.distributed.run: [WARNING] *****************************************
[2024-04-25 14:52:22,496] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-04-25 14:52:22,496] torch.distributed.run: [WARNING] *****************************************
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  warnings.warn(
2024-04-25 14:52:30,789 - modelscope - INFO - PyTorch version 2.2.1 Found.
2024-04-25 14:52:30,790 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-04-25 14:52:30,815 - modelscope - INFO - Loading done! Current index file version is 1.13.1, with md5 c03f3a22f842d6202e8e0127a31e0e2b and a total number of 972 components indexed
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:45<00:00,  2.17s/it]
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(

训练完成后的模型文件：

ll work_dirs/internlm2_20b_qlora_msagent_react_e3_gpu8/
total 48
drwxr-xr-x 3 root root    49 Apr 25 14:52 20240425_145227
-rw-r--r-- 1 root root  8677 Apr 25 14:52 internlm2_20b_qlora_msagent_react_e3_gpu8.py
drwxr-xr-x 2 root root  4096 Apr 25 15:01 iter_40.pth
-rw-r--r-- 1 root root    85 Apr 25 15:03 last_checkpoint
-rwxr--r-- 1 root root 25314 Apr 25 15:03 zero_to_fp32.py

请问是应该这样训练COT模型能力吗？这样训练正常吗？训练过程中为什么不打印loss日志了呢？

感谢！

你好！

我最今天在用cot格式的语料微调模型时，训练完成也不打印loss过程日志了，也不报错，请问是什么问题呢？

全参微调运行命令：NPROC_PER_NODE=8 xtuner train internlm2_20b_qlora_msagent_react_e3_gpu8.py --deepspeed deepspeed_zero3

训练参数：

use_varlen_attn = False
prompt_template = PROMPT_TEMPLATE.default
max_length = 8192
pack_to_max_length = False

# Scheduler & Optimizer
batch_size = 4  # per_device
accumulative_counts = 1
dataloader_num_workers = 2
max_epochs = 10
optim_type = AdamW
lr = 2e-5
betas = (0.9, 0.999)
weight_decay = 0
max_norm = 1  # grad clip
warmup_ratio = 0.03

# Save
save_steps = 500
save_total_limit = 1  # Maximum checkpoints to keep (-1 means unlimited)

# Evaluate the generation performance during the training
evaluation_freq = 500

自己整理的一条cot语料样例： COT语料样例.json

训练完整的日志： cot训练日志.txt

部分日志：

NPROC_PER_NODE=8 xtuner train internlm2_20b_qlora_msagent_react_e3_gpu8.py --deepspeed deepspeed_zero3 > log_sft/425sft-329pt-.log
[2024-04-25 14:52:22,496] torch.distributed.run: [WARNING]
[2024-04-25 14:52:22,496] torch.distributed.run: [WARNING] *****************************************
[2024-04-25 14:52:22,496] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-04-25 14:52:22,496] torch.distributed.run: [WARNING] *****************************************
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  warnings.warn(
2024-04-25 14:52:30,789 - modelscope - INFO - PyTorch version 2.2.1 Found.
2024-04-25 14:52:30,790 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-04-25 14:52:30,815 - modelscope - INFO - Loading done! Current index file version is 1.13.1, with md5 c03f3a22f842d6202e8e0127a31e0e2b and a total number of 972 components indexed
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:45<00:00,  2.17s/it]
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(

训练完成后的模型文件：

ll work_dirs/internlm2_20b_qlora_msagent_react_e3_gpu8/
total 48
drwxr-xr-x 3 root root    49 Apr 25 14:52 20240425_145227
-rw-r--r-- 1 root root  8677 Apr 25 14:52 internlm2_20b_qlora_msagent_react_e3_gpu8.py
drwxr-xr-x 2 root root  4096 Apr 25 15:01 iter_40.pth
-rw-r--r-- 1 root root    85 Apr 25 15:03 last_checkpoint
-rwxr--r-- 1 root root 25314 Apr 25 15:03 zero_to_fp32.py

请问是应该这样训练COT模型能力吗？这样训练正常吗？训练过程中为什么不打印loss日志了呢？

感谢！

@LZHgrla 方便帮忙看一下吗，谢谢！

@sxk000 看起来是因为数据太少导致了总iter太少，而log的频率超过了总的iter数，导致没有log。

可以查看一下你所使用的config中的这个参数 interval

https://github.com/InternLM/xtuner/blob/75703c39b95ea792fa13df3b0baf70aec0832c15/xtuner/configs/internlm/internlm2_chat_7b/internlm2_chat_7b_qlora_alpaca_e3.py#L180

@sxk000 看起来是因为数据太少导致了总iter太少，而log的频率超过了总的iter数，导致没有log。

可以查看一下你所使用的config中的这个参数 interval

https://github.com/InternLM/xtuner/blob/75703c39b95ea792fa13df3b0baf70aec0832c15/xtuner/configs/internlm/internlm2_chat_7b/internlm2_chat_7b_qlora_alpaca_e3.py#L180

你好，感谢回复！

不是log，config中参数 interval就是与你说的一样。现在的问题是训练时，没有每一批训练的loss打印，如下图这样的：

@sxk000 你的训练命令应该有点问题，qlora不支持deepspeed zero3的训练，请使用 zero2。同时，不打印训练log（即你所说的loss）的情况，可以贴一下config，我们看一下。

@LZHgrla 我用的是全参训练的，config文件： internlm2_20b_qlora_msagent_react_e3_gpu8.txt 这个上面好像不能上传py格式的文件，你把上面的文件txt格式改成py格式。

@sxk000 完整的训练log请张贴一下？

可以直接用 ```框起来发到帖子中

贴起来的话会超出最大字数限制，之前试过，训练完整的日志： cot训练日志.txt 一条准备的cot语料： COT语料样例.json

一条自己准备的cot语料：

{
    "conversations": [
        {
            "from": "system",
            "value": "你有多种能力，可以通过插件集成的模型api来回复用户的问题，还能解答用户使用模型遇到的问题和模型知识相关问答。\n目前支持的插件信息如下，请自行判断是否需要调用插件来解决当前用户问题。\n若需要调用插件，则需要将插件调用请求按照json格式给出，必须包含plugin_name、parameters字段，对于api的调用过程中，在思考及调用之前使用作为标志，在思考和调用结束后使用，对于调用过程中用于向api传递参数的json字段，需要使用```json作为开始的标志，```作为结束的标志。\n1. {\"plugin_name\":\"ST-0000002801_ID-0000002387\",\"description\":\"针对用户输入的不锈钢工业焊接圆管市场价格查询信息，识别里面的元素，包括 品种、标准、材质、规格、指标、地区等\",\"parameters\":[{\"name\":\"date\",\"description\":\"时间或日期\"},{\"name\":\"breed_name\",\"description\":\"品种\",\"required\":true},{\"name\":\"standard_name\",\"description\":\"标准\"},{\"name\":\"mq_name\",\"description\":\"材质\"},{\"name\":\"sc_name\",\"description\":\"规格\"},{\"name\":\"metric_name\",\"description\":\"价格类型\",\"required\":true},{\"name\":\"area_name\",\"description\":\"地区\"}]}"
        },
        {
            "from": "user",
            "value": "我想了解一下不锈钢工业焊接圆管的价格"
        },
        {
            "from": "assistant",
            "value": "您好，您想了解不锈钢工业焊接圆管的价格，请问您需要哪种规格的圆管呢？需要调用ST-0000002801_ID-0000002387插件来查询不锈钢工业焊接圆管的价格\\n\\n调用请求如下：\\n{\"plugin_name\": \"ST-0000002801_ID-0000002387\", \"parameters\": {\"date\": \"2024年\", \"breed_name\": \"不锈钢工业焊接圆管\", \"metric_name\": \"市场价格\"}}"
        },
        {
            "from": "user",
            "value": "Φ19*1的"
        },
        {
            "from": "assistant",
            "value": "明白了。您需要了解的是哪个地区的市场价格呢？需要调用ST-0000002801_ID-0000002387插件来查询Φ19*1的不锈钢工业焊接圆管的市场价格是多少\\n\\n调用请求如下：\\n{\"plugin_name\": \"ST-0000002801_ID-0000002387\", \"parameters\": {\"date\": \"2024年\", \"sc_name\": \"Φ19*1\", \"breed_name\": \"不锈钢工业焊接圆管\", \"metric_name\": \"市场价格\"}}"
        },
        {
            "from": "user",
            "value": "太原市的"
        },
        {
            "from": "assistant",
            "value": "好的，您想了解的是太原维太新材科技有限公司的444毛坯不锈钢工业焊接圆管，Φ19*1在太原市的市场价格吗？需要调用ST-0000002801_ID-0000002387插件来查询太原市的工业焊接圆管市场价格是多少\\n\\n调用请求如下：\\n{\"plugin_name\": \"ST-0000002801_ID-0000002387\", \"parameters\": {\"date\": \"2024年\", \"area_name\": \"太原市\", \"breed_name\": \"工业焊接圆管\", \"metric_name\": \"市场价格\"}}"
        },
        {
            "from": "user",
            "value": "是的，1981年7月5日的价格"
        },
        {
            "from": "assistant",
            "value": "好的，您需要调用ST-0000002801_ID-0000002387插件来查询1981年7月5日太原维太新材科技有限公司444毛坯不锈钢工业焊接圆管，Φ19*1，在太原市的市场价格。调用请求如下：需要调用ST-0000002801_ID-0000002387插件来查询1981年7月5日的价格\\n\\n调用请求如下：\\n{\"plugin_name\": \"ST-0000002801_ID-0000002387\", \"parameters\": {\"date\": \"1981年7月5日\", \"breed_name\": \"不锈钢工业焊接圆管\", \"metric_name\": \"市场价格\"}}"
        }
    ]
}

@LZHgrla 完整的日志：

NPROC_PER_NODE=8 xtuner train internlm2_20b_qlora_msagent_react_e3_gpu8.py --deepspeed deepspeed_zero3 > log_sft/425sft-329pt-.log
[2024-04-25 14:52:22,496] torch.distributed.run: [WARNING]
[2024-04-25 14:52:22,496] torch.distributed.run: [WARNING] *****************************************
[2024-04-25 14:52:22,496] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-04-25 14:52:22,496] torch.distributed.run: [WARNING] *****************************************
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/mmengine/utils/dl_utils/setup_env.py:56: UserWarning: Setting MKL_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
  warnings.warn(
2024-04-25 14:52:30,789 - modelscope - INFO - PyTorch version 2.2.1 Found.
2024-04-25 14:52:30,790 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-04-25 14:52:30,815 - modelscope - INFO - Loading done! Current index file version is 1.13.1, with md5 c03f3a22f842d6202e8e0127a31e0e2b and a total number of 972 components indexed
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:32<00:00,  1.54s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 21/21 [00:45<00:00,  2.17s/it]
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable       - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISMTOKENIZERS_PARALLELISM=(true | false)
=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable       - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/root/miniconda3/envs/p310xtuner/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(

@sxk000 log 应该会存到 log_sft/425sft-329pt-.log 里？

@sxk000 log 应该会存到 log_sft/425sft-329pt-.log 里？

是的，这个文件后来运行时覆盖了，已经不是原来的内容了，下周我再运行一下，再把完整的日志传给你吧，非常感谢！

@LZHgrla 今天我又尝试跑了几次，应该是训练时没有及时打印出来loss，后台log里面有。之前都是及时打印出来的，所以我误以为训练过程出现了问题。目前还在训练中，暂时没有出现问题！

感谢！

一条自己准备的cot语料：

{
  "conversations": [
      {
          "from": "system",
          "value": "你有多种能力，可以通过插件集成的模型api来回复用户的问题，还能解答用户使用模型遇到的问题和模型知识相关问答。\n目前支持的插件信息如下，请自行判断是否需要调用插件来解决当前用户问题。\n若需要调用插件，则需要将插件调用请求按照json格式给出，必须包含plugin_name、parameters字段，对于api的调用过程中，在思考及调用之前使用作为标志，在思考和调用结束后使用，对于调用过程中用于向api传递参数的json字段，需要使用```json作为开始的标志，```作为结束的标志。\n1. {\"plugin_name\":\"ST-0000002801_ID-0000002387\",\"description\":\"针对用户输入的不锈钢工业焊接圆管市场价格查询信息，识别里面的元素，包括 品种、标准、材质、规格、指标、地区等\",\"parameters\":[{\"name\":\"date\",\"description\":\"时间或日期\"},{\"name\":\"breed_name\",\"description\":\"品种\",\"required\":true},{\"name\":\"standard_name\",\"description\":\"标准\"},{\"name\":\"mq_name\",\"description\":\"材质\"},{\"name\":\"sc_name\",\"description\":\"规格\"},{\"name\":\"metric_name\",\"description\":\"价格类型\",\"required\":true},{\"name\":\"area_name\",\"description\":\"地区\"}]}"
      },
      {
          "from": "user",
          "value": "我想了解一下不锈钢工业焊接圆管的价格"
      },
      {
          "from": "assistant",
          "value": "您好，您想了解不锈钢工业焊接圆管的价格，请问您需要哪种规格的圆管呢？需要调用ST-0000002801_ID-0000002387插件来查询不锈钢工业焊接圆管的价格\\n\\n调用请求如下：\\n{\"plugin_name\": \"ST-0000002801_ID-0000002387\", \"parameters\": {\"date\": \"2024年\", \"breed_name\": \"不锈钢工业焊接圆管\", \"metric_name\": \"市场价格\"}}"
      },
      {
          "from": "user",
          "value": "Φ19*1的"
      },
      {
          "from": "assistant",
          "value": "明白了。您需要了解的是哪个地区的市场价格呢？需要调用ST-0000002801_ID-0000002387插件来查询Φ19*1的不锈钢工业焊接圆管的市场价格是多少\\n\\n调用请求如下：\\n{\"plugin_name\": \"ST-0000002801_ID-0000002387\", \"parameters\": {\"date\": \"2024年\", \"sc_name\": \"Φ19*1\", \"breed_name\": \"不锈钢工业焊接圆管\", \"metric_name\": \"市场价格\"}}"
      },
      {
          "from": "user",
          "value": "太原市的"
      },
      {
          "from": "assistant",
          "value": "好的，您想了解的是太原维太新材科技有限公司的444毛坯不锈钢工业焊接圆管，Φ19*1在太原市的市场价格吗？需要调用ST-0000002801_ID-0000002387插件来查询太原市的工业焊接圆管市场价格是多少\\n\\n调用请求如下：\\n{\"plugin_name\": \"ST-0000002801_ID-0000002387\", \"parameters\": {\"date\": \"2024年\", \"area_name\": \"太原市\", \"breed_name\": \"工业焊接圆管\", \"metric_name\": \"市场价格\"}}"
      },
      {
          "from": "user",
          "value": "是的，1981年7月5日的价格"
      },
      {
          "from": "assistant",
          "value": "好的，您需要调用ST-0000002801_ID-0000002387插件来查询1981年7月5日太原维太新材科技有限公司444毛坯不锈钢工业焊接圆管，Φ19*1，在太原市的市场价格。调用请求如下：需要调用ST-0000002801_ID-0000002387插件来查询1981年7月5日的价格\\n\\n调用请求如下：\\n{\"plugin_name\": \"ST-0000002801_ID-0000002387\", \"parameters\": {\"date\": \"1981年7月5日\", \"breed_name\": \"不锈钢工业焊接圆管\", \"metric_name\": \"市场价格\"}}"
      }
  ]
}

@LZHgrla 你好！ 1，想问一下像这种上下文语料格式是按照这种形式的吗？还是其他什么格式的呢？ 2，如果是这样的格式，在训练模型时，每个assistant都会计算loss吗？还是说只有最后一个的assistant会计算loss，上午不做计算只起到上下文信息的作用呢？ 3，如果是这样的格式，当中间某一问的插件名称（plugin_name）变化了，对应的system也会跟着变化，这种情况下要写多个system吗？如下，第一轮问的是plugin_name_1插件名，第二轮问的是plugin_name_2插件名。

{
    "conversations": [
        {
            "from": "system",
            "value": "你有多种能力1. {\"plugin_name\":\"plugin_name_1\",\"description\":\"针对用户输入的不锈钢工业焊接圆管市场价格查询信息，识别里面的元素，包括 品种、标准、材质、规格、指标、地区等\",\"parameters\":[{\"name\":\"date\",\"description\":\"时间或日期\"},{\"name\":\"breed_name\",\"description\":\"品种\",\"required\":true},{\"name\":\"standard_name\",\"description\":\"标准\"},{\"name\":\"mq_name\",\"description\":\"材质\"},{\"name\":\"sc_name\",\"description\":\"规格\"},{\"name\":\"metric_name\",\"description\":\"价格类型\",\"required\":true},{\"name\":\"area_name\",\"description\":\"地区\"}]}"
        },
        {
            "from": "user",
            "value": "我想了解一下不锈钢工业焊接圆管的价格"
        },
        {
            "from": "assistant",
            "value": "您好，您想了解不锈钢工业焊接圆管的价格，请问您需要哪种规格的圆管呢？需要调用plugin_name_1插件来查询不锈钢工业焊接圆管的价格\\n\\n调用请求如下：\\n{\"plugin_name\": \"plugin_name_1\", \"parameters\": {\"date\": \"2024年\", \"breed_name\": \"不锈钢工业焊接圆管\", \"metric_name\": \"市场价格\"}}"
        },
        {
            "from": "system",
            "value": "你有多种能力1. {\"plugin_name\":\"plugin_name_2\",\"description\":\"针对用户输入信息，识别地区、日期\",\"parameters\":[{\"name\":\"area_name\",\"description\":\"地区\"},{\"name\":\"date\",\"description\":\"时间\"}]}"
        },
        {
            "from": "user",
            "value": "今天上海天气怎么样？"
        },
        {
            "from": "assistant",
            "value": "明白了。您需要了解的是哪个地区的市场价格呢？需要调用plugin_name_2插件来查询Φ19*1的不锈钢工业焊接圆管的市场价格是多少\\n\\n调用请求如下：\\n{\"plugin_name\": \"plugin_name_2\", \"parameters\": {\"date\": \"今天\", \"area_name\": \"上海\"}}"
        }
    ]
}

建议将数据集处理为 OpenAI SFT 数据集格式
每个assistant部分都会计算loss
这种情况可以写多个system，也可以把上面的多轮对话拆成两个单轮对话，看看哪种实际效果好

@HIT-cwh 好的，我尝试一下，感谢解答！

InternLM / xtuner

internlm2_20b_qlora_msagent_react_e3_gpu8脚本运行时报错 #466

Model

Data

470

470