Open xkjcf opened 1 year ago
同问,Python 3.9也不行,换了机器也不行。
同问,遇到了相同的问题。 另一个问题时requirement 中版本有冲突 The conflict is caused by: The user requested torch==2.0.0 deepspeed 0.9.2 depends on torch xformers 0.0.20 depends on torch==2.0.1
+1
同问
同问,遇到了相同的问题。 另一个问题时requirement 中版本有冲突 The conflict is caused by: The user requested torch==2.0.0 deepspeed 0.9.2 depends on torch xformers 0.0.20 depends on torch==2.0.1
我在其他issue里也看到了,安装的也是torch==2.0.1,但仍然出现上面的问题。请问大家是如何解决的呢?
我也遇到了同样的问题,在deepspeed issues中有找到相关说明https://github.com/microsoft/DeepSpeed/issues/3234,ZeRO stage 3支持zero.init,stage 1和2不支持,我把deepspeed.json中stage改成3解决了这个问题
我也遇到了同样的问题,在deepspeed issues中有找到相关说明https://github.com/microsoft/DeepSpeed/issues/3234,ZeRO stage 3支持zero.init,stage 1和2不支持,我把deepspeed.json中stage改成3解决了这个问题
按你的方法修改后有新的报错,你有遇到吗
我也遇到了同样的问题,在deepspeed issues中有找到相关说明https://github.com/microsoft/DeepSpeed/issues/3234,ZeRO stage 3支持zero.init,stage 1和2不支持,我把deepspeed.json中stage改成3解决了这个问题
按你的方法修改后有新的报错,你有遇到吗
我也遇到这个问题了 有解决方法吗
Required prerequisites
Questions
下载了model,创建了data_dir目录,创建了一个新的script/train2.sh脚本。 `
!/bin/bash
deepspeed train.py \ --deepspeed \ --deepspeed_config config/deepspeed.json
model_engine = prepare_model()
File "/root/code/Baichuan-7B/train.py", line 117, in prepare_model
modelengine, , , = deepspeed.initialize(args=args,
File "/root/miniconda3/lib/python3.10/site-packages/deepspeed/init.py", line 165, in initialize
engine = DeepSpeedEngine(args=args,
File "/root/miniconda3/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 308, in init
self._configure_optimizer(optimizer, model_parameters)
File "/root/miniconda3/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1173, in _configure_optimizer
self.optimizer = self._configure_zero_optimizer(basic_optimizer)
File "/root/miniconda3/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1409, in _configure_zero_optimizer
optimizer = DeepSpeedZeroOptimizer(
File "/root/miniconda3/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 468, in init
self.initialize_gradient_partitioning_data_structures()
File "/root/miniconda3/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 691, in initialize_gradient_partitioning_data_structures
self.first_param_index_in_partition[i][partition_id] = self.get_first_param_index(
File "/root/miniconda3/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 666, in get_first_param_index
if partition_id in self.param_to_partition_ids[group_id][param_id]:
KeyError: 0
`
运行该脚本,报如下的错误:
Traceback (most recent call last): File "/root/code/Baichuan-7B/train.py", line 138, indata_dir中的训练文档为普通的多行文本。
Checklist