PYTHONPATH:
which python: /opt/conda/envs/umt/bin/python
PYTHONPATH: :/opt/conda/envs/umt/bin/python:.
torchrun.sh: line 2: scontrol: command not found
torchrun.sh: line 3: scontrol: command not found
All nodes used:
Master node:
Args:
--nnodes=1 --nproc_per_node=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12881 tasks/pretrain.py scripts/evaluation/stage2/zero_shot/1B/config_msrvtt.py output_dir scripts/evaluation/stage2/zero_shot/1B/eval_msrvtt.sh_20240616_135110 evaluate True pretrained_path ckpt/InternVideo2-stage2_1b-224p-f4.pt
[2024-06-16 13:51:12,301] torch.distributed.run: [WARNING] master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
No module named 'deepspeed'
deepspeed is not installed!!!
Traceback (most recent call last):
File "/data2/dy/code/InternVideo/InternVideo2/multi_modality/tasks/pretrain.py", line 14, in <module>
from models import *
File "/data2/dy/code/InternVideo/InternVideo2/multi_modality/models/__init__.py", line 1, in <module>
from .internvideo2_clip import InternVideo2_CLIP
File "/data2/dy/code/InternVideo/InternVideo2/multi_modality/models/internvideo2_clip.py", line 10, in <module>
from .backbones.internvideo2 import InternVideo2, LLaMA, Tokenizer
File "/data2/dy/code/InternVideo/InternVideo2/multi_modality/models/backbones/internvideo2/__init__.py", line 1, in <module>
from .internvl_clip_vision import internvl_clip_6b
File "/data2/dy/code/InternVideo/InternVideo2/multi_modality/models/backbones/internvideo2/internvl_clip_vision.py", line 17, in <module>
from flash_attn.ops.rms_norm import DropoutAddRMSNorm
File "/opt/conda/envs/umt/lib/python3.10/site-packages/flash_attn/ops/rms_norm.py", line 7, in <module>
from flash_attn.ops.layer_norm import (
File "/opt/conda/envs/umt/lib/python3.10/site-packages/flash_attn/ops/layer_norm.py", line 4, in <module>
import dropout_layer_norm
ModuleNotFoundError: No module named 'dropout_layer_norm'
查了一下,是因为flash_attn官方更改了layernorm的实现方式, https://github.com/Dao-AILab/flash-attention/issues/587#issuecomment-2027853183 ,导致现在安装的flash_attn都没有droupout_layer_norm了,有什么解决办法吗?