03/04 23:12:11 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized.
03/04 23:12:12 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook
Generating train split: 100000 examples [00:01, 89208.28 examples/s]
Map (num_proc=32): 0%| | 0/100000 [00:00<?, ? examples/s]
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/multiprocess/pool.py", line 125, in worker
result = (True, func(args, kwds))
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 623, in _write_generator_to_queue
for i, result in enumerate(func(kwargs)):
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3458, in _map_single
example = apply_function_on_filtered_inputs(example, i, offset=offset)
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3361, in apply_function_on_filtered_inputs
processed_inputs = function(fn_args, *additional_args, **fn_kwargs)
File "/home/zhanghui/xtuner0305/xtuner/xtuner/dataset/map_fns/dataset_map_fns/oasst1_map_fn.py", line 22, in oasst1_map_fn
for sentence in example['text'].strip().split('###'):
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 270, in getitem
value = self.data[key]
KeyError: 'text'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/zhanghui/xtuner0305/xtuner/xtuner/tools/train.py", line 307, in
main()
File "/home/zhanghui/xtuner0305/xtuner/xtuner/tools/train.py", line 303, in main
runner.train()
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1160, in train
self._train_loop = self.build_train_loop(
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 965, in build_train_loop
loop = EpochBasedTrainLoop(
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/runner/loops.py", line 44, in init
super().init(runner, dataloader)
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/runner/base_loop.py", line 26, in init
self.dataloader = runner.build_dataloader(
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 824, in build_dataloader
dataset = DATASETS.build(dataset_cfg)
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, kwargs, registry=self)
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(args) # type: ignore
File "/home/zhanghui/xtuner0305/xtuner/xtuner/dataset/huggingface.py", line 299, in process_hf_dataset
return process(kwargs)
File "/home/zhanghui/xtuner0305/xtuner/xtuner/dataset/huggingface.py", line 179, in process
dataset = map_dataset(dataset, dataset_map_fn, map_num_proc)
File "/home/zhanghui/xtuner0305/xtuner/xtuner/dataset/huggingface.py", line 50, in map_dataset
dataset = dataset.map(dataset_map_fn, num_proc=map_num_proc)
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 593, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, *kwargs)
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 558, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, args, kwargs)
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3197, in map
for rank, done, content in iflatmap_unordered(
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 663, in iflatmap_unordered
[async_result.get(timeout=0.05) for async_result in async_results]
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 663, in
[async_result.get(timeout=0.05) for async_result in async_results]
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/multiprocess/pool.py", line 774, in get
raise self._value
KeyError: 'text'
(xtuner0305) zhanghui@zhanghui:~/shishen18$
···
xtuner 最新版0.1.15dev0 以及 xtuner0.1.13
1.8b微调脚本不知道选择哪个,沿用了以前的脚本: xtuner copy-cfg internlm2_chat_7b_qlora_oasst1_e3 .
报错如下: ··· (xtuner0305) zhanghui@zhanghui:~/shishen18$ xtuner train ./internlm2_chat_7b_qlora_oasst1_e3_copy.py --deepspeed deepspeed_zero2 [2024-03-04 23:12:07,542] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-04 23:12:10,204] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) 03/04 23:12:11 - mmengine - INFO -
System environment: sys.platform: linux Python: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] CUDA available: True MUSA available: False numpy_random_seed: 478825338 GPU 0: NVIDIA GeForce RTX 4090 GPU 1: NVIDIA GeForce RTX 3080 Laptop GPU CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.6, V11.6.124 GCC: gcc (Ubuntu 9.5.0-1ubuntu1~22.04) 9.5.0 PyTorch: 2.2.1+cu121 PyTorch compiling details: PyTorch built with:
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,
TorchVision: 0.17.1+cu121 OpenCV: 4.9.0 MMEngine: 0.10.3
Runtime environment: launcher: none randomness: {'seed': None, 'deterministic': False} cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: None deterministic: False Distributed launcher: none Distributed training: False GPU number: 1
03/04 23:12:11 - mmengine - INFO - Config: SYSTEM = '' accumulative_counts = 16 batch_size = 1 betas = ( 0.9, 0.999, ) custom_hooks = [ dict( tokenizer=dict( padding_side='right', pretrained_model_name_or_path= '/home/zhanghui/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b', trust_remote_code=True, type='transformers.AutoTokenizer.from_pretrained'), type='xtuner.engine.DatasetInfoHook'), dict( evaluation_inputs=[ '请给我介绍五个上海的景点', 'Please tell me five scenic spots in Shanghai', ], every_n_iters=500, prompt_template='xtuner.utils.PROMPT_TEMPLATE.internlm2_chat', system='', tokenizer=dict( padding_side='right', pretrained_model_name_or_path= '/home/zhanghui/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b', trust_remote_code=True, type='transformers.AutoTokenizer.from_pretrained'), type='xtuner.engine.EvaluateChatHook'), ] data_path = './dataset/tran_dataset_0.json' dataloader_num_workers = 0 default_hooks = dict( checkpoint=dict(interval=1, type='mmengine.hooks.CheckpointHook'), logger=dict(interval=10, type='mmengine.hooks.LoggerHook'), param_scheduler=dict(type='mmengine.hooks.ParamSchedulerHook'), sampler_seed=dict(type='mmengine.hooks.DistSamplerSeedHook'), timer=dict(type='mmengine.hooks.IterTimerHook')) env_cfg = dict( cudnn_benchmark=False, dist_cfg=dict(backend='nccl'), mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0)) evaluation_freq = 500 evaluation_inputs = [ '请给我介绍五个上海的景点', 'Please tell me five scenic spots in Shanghai', ] launcher = 'none' load_from = None log_level = 'INFO' lr = 0.0002 max_epochs = 1 max_length = 2048 max_norm = 1 model = dict( llm=dict( pretrained_model_name_or_path= '/home/zhanghui/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b', quantization_config=dict( bnb_4bit_compute_dtype='torch.float16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, llm_int8_has_fp16_weight=False, llm_int8_threshold=6.0, load_in_4bit=True, load_in_8bit=False, type='transformers.BitsAndBytesConfig'), torch_dtype='torch.float16', trust_remote_code=True, type='transformers.AutoModelForCausalLM.from_pretrained'), lora=dict( bias='none', lora_alpha=16, lora_dropout=0.1, r=64, task_type='CAUSAL_LM', type='peft.LoraConfig'), type='xtuner.model.SupervisedFinetune') optim_type = 'torch.optim.AdamW' optim_wrapper = dict( optimizer=dict( betas=( 0.9, 0.999, ), lr=0.0002, type='torch.optim.AdamW', weight_decay=0), type='DeepSpeedOptimWrapper') pack_to_max_length = True param_scheduler = [ dict( begin=0, by_epoch=True, convert_to_iter_based=True, end=0.03, start_factor=1e-05, type='mmengine.optim.LinearLR'), dict( T_max=1, begin=0.03, by_epoch=True, convert_to_iter_based=True, eta_min=0.0, type='mmengine.optim.CosineAnnealingLR'), ] pretrained_model_name_or_path = '/home/zhanghui/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b' prompt_template = 'xtuner.utils.PROMPT_TEMPLATE.internlm2_chat' randomness = dict(deterministic=False, seed=None) resume = False runner_type = 'FlexibleRunner' strategy = dict( config=dict( bf16=dict(enabled=True), fp16=dict(enabled=False, initial_scale_power=16), gradient_accumulation_steps='auto', gradient_clipping='auto', train_micro_batch_size_per_gpu='auto', zero_allow_untested_optimizer=True, zero_force_ds_cpu_optimizer=False, zero_optimization=dict(overlap_comm=True, stage=2)), exclude_frozen_parameters=True, gradient_accumulation_steps=16, gradient_clipping=1, train_micro_batch_size_per_gpu=1, type='xtuner.engine.DeepSpeedStrategy') tokenizer = dict( padding_side='right', pretrained_model_name_or_path= '/home/zhanghui/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b', trust_remote_code=True, type='transformers.AutoTokenizer.from_pretrained') train_cfg = dict(by_epoch=True, max_epochs=1, val_interval=1) train_dataloader = dict( batch_size=1, collate_fn=dict(type='xtuner.dataset.collate_fns.default_collate_fn'), dataset=dict( dataset=dict( data_files=dict(train='./dataset/tran_dataset_0.json'), path='json', type='datasets.load_dataset'), dataset_map_fn='xtuner.dataset.map_fns.oasst1_map_fn', max_length=2048, pack_to_max_length=True, remove_unused_columns=True, shuffle_before_pack=True, template_map_fn=dict( template='xtuner.utils.PROMPT_TEMPLATE.internlm2_chat', type='xtuner.dataset.map_fns.template_map_fn_factory'), tokenizer=dict( padding_side='right', pretrained_model_name_or_path= '/home/zhanghui/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b', trust_remote_code=True, type='transformers.AutoTokenizer.from_pretrained'), type='xtuner.dataset.process_hf_dataset'), num_workers=0, sampler=dict(shuffle=True, type='mmengine.dataset.DefaultSampler')) train_dataset = dict( dataset=dict( data_files=dict(train='./dataset/tran_dataset_0.json'), path='json', type='datasets.load_dataset'), dataset_map_fn='xtuner.dataset.map_fns.oasst1_map_fn', max_length=2048, pack_to_max_length=True, remove_unused_columns=True, shuffle_before_pack=True, template_map_fn=dict( template='xtuner.utils.PROMPT_TEMPLATE.internlm2_chat', type='xtuner.dataset.map_fns.template_map_fn_factory'), tokenizer=dict( padding_side='right', pretrained_model_name_or_path= '/home/zhanghui/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b', trust_remote_code=True, type='transformers.AutoTokenizer.from_pretrained'), type='xtuner.dataset.process_hf_dataset') visualizer = None warmup_ratio = 0.03 weight_decay = 0 work_dir = './work_dirs/internlm2_chat_7b_qlora_oasst1_e3_copy'
03/04 23:12:11 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized. 03/04 23:12:12 - mmengine - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook
before_train: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DatasetInfoHook
(LOW ) EvaluateChatHook
(VERY_LOW ) CheckpointHook
before_train_epoch: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook
before_train_iter: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
after_train_iter: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(LOW ) EvaluateChatHook
(VERY_LOW ) CheckpointHook
after_train_epoch: (NORMAL ) IterTimerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
before_val: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) DatasetInfoHook
before_val_epoch: (NORMAL ) IterTimerHook
before_val_iter: (NORMAL ) IterTimerHook
after_val_iter: (NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
after_val_epoch: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook
after_val: (VERY_HIGH ) RuntimeInfoHook
(LOW ) EvaluateChatHook
after_train: (VERY_HIGH ) RuntimeInfoHook
(LOW ) EvaluateChatHook
(VERY_LOW ) CheckpointHook
before_test: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) DatasetInfoHook
before_test_epoch: (NORMAL ) IterTimerHook
before_test_iter: (NORMAL ) IterTimerHook
after_test_iter: (NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
after_test_epoch: (VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
after_test: (VERY_HIGH ) RuntimeInfoHook
after_run: (BELOW_NORMAL) LoggerHook
Generating train split: 100000 examples [00:01, 89208.28 examples/s] Map (num_proc=32): 0%| | 0/100000 [00:00<?, ? examples/s] multiprocess.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/multiprocess/pool.py", line 125, in worker result = (True, func(args, kwds)) File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 623, in _write_generator_to_queue for i, result in enumerate(func(kwargs)): File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3458, in _map_single example = apply_function_on_filtered_inputs(example, i, offset=offset) File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3361, in apply_function_on_filtered_inputs processed_inputs = function(fn_args, *additional_args, **fn_kwargs) File "/home/zhanghui/xtuner0305/xtuner/xtuner/dataset/map_fns/dataset_map_fns/oasst1_map_fn.py", line 22, in oasst1_map_fn for sentence in example['text'].strip().split('###'): File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 270, in getitem value = self.data[key] KeyError: 'text' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/zhanghui/xtuner0305/xtuner/xtuner/tools/train.py", line 307, in
main()
File "/home/zhanghui/xtuner0305/xtuner/xtuner/tools/train.py", line 303, in main
runner.train()
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1160, in train
self._train_loop = self.build_train_loop(
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 965, in build_train_loop
loop = EpochBasedTrainLoop(
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/runner/loops.py", line 44, in init
super().init(runner, dataloader)
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/runner/base_loop.py", line 26, in init
self.dataloader = runner.build_dataloader(
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 824, in build_dataloader
dataset = DATASETS.build(dataset_cfg)
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, kwargs, registry=self)
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(args) # type: ignore
File "/home/zhanghui/xtuner0305/xtuner/xtuner/dataset/huggingface.py", line 299, in process_hf_dataset
return process(kwargs)
File "/home/zhanghui/xtuner0305/xtuner/xtuner/dataset/huggingface.py", line 179, in process
dataset = map_dataset(dataset, dataset_map_fn, map_num_proc)
File "/home/zhanghui/xtuner0305/xtuner/xtuner/dataset/huggingface.py", line 50, in map_dataset
dataset = dataset.map(dataset_map_fn, num_proc=map_num_proc)
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 593, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, *kwargs)
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 558, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, args, kwargs)
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3197, in map
for rank, done, content in iflatmap_unordered(
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 663, in iflatmap_unordered
[async_result.get(timeout=0.05) for async_result in async_results]
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 663, in
[async_result.get(timeout=0.05) for async_result in async_results]
File "/home/zhanghui/anaconda3/envs/xtuner0305/lib/python3.10/site-packages/multiprocess/pool.py", line 774, in get
raise self._value
KeyError: 'text'
(xtuner0305) zhanghui@zhanghui:~/shishen18$
···