Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Wandb import failed
Wandb import failed
Wandb import failed
Wandb import failed
Wandb import failed
Wandb import failed
Wandb import failed
Wandb import failed
using world size: 4, data-parallel-size: 4, tensor-model-parallel size: 1, pipeline-model-parallel size: 1
WARNING: overriding default arguments for ffn_hidden_size:None with ffn_hidden_size:2048
WARNING: overriding default arguments for swiglu:True with swiglu:True
WARNING: overriding default arguments for use_cpu_initialization:True with use_cpu_initialization:True
WARNING: overriding default arguments for recompute_granularity:selective with recompute_granularity:selective
using torch.float16 for parameters ...
setting number of micro-batches to constant 12
building SentencePieceTokenizer tokenizer ...
padded vocab (size: 32005) with 123 dummy tokens (new size: 32128)
setting tensorboard ...
initializing torch distributed ...
initialized tensor model parallel with size 1
initialized pipeline model parallel with size 1
setting random seeds to 1234 ...
compiling dataset index builder ...
make: Entering directory '/mnt/workspace/binxian.zb/llma-megatron/Megatron-LM/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/mnt/workspace/binxian.zb/llma-megatron/Megatron-LM/megatron/data'
您好,如题所示,megatron/core/tensor_parallel/layers.py中的243行处报错。 按照我先前的经验,这种报错的原因通常是因为维度不匹配。但因为我第一次用llama和megatron,因此还是想请教下您,看看您是否先前也遇到过这个问题。 不知道是不是我数据预处理时用的vocab-file和merge-file是gpt2的原因所导致的这个问题,或者还是说因为其它原因。 我也print出了total_input 和 weight.t() 的维度和device,都是匹配的呢。谢谢~
WARNING:torch.distributed.run:
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
Wandb import failed Wandb import failed Wandb import failed Wandb import failed Wandb import failed Wandb import failed Wandb import failed Wandb import failed using world size: 4, data-parallel-size: 4, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 WARNING: overriding default arguments for ffn_hidden_size:None with ffn_hidden_size:2048 WARNING: overriding default arguments for swiglu:True with swiglu:True WARNING: overriding default arguments for use_cpu_initialization:True with use_cpu_initialization:True WARNING: overriding default arguments for recompute_granularity:selective with recompute_granularity:selective using torch.float16 for parameters ...
setting number of micro-batches to constant 12