Closed RockYuan closed 1 year ago
Bug Report Checklist
from autogluon.multimodal import MultiModalPredictor import uuid train_en_df_downsample = train_en_df.sample(200, random_state=123) new_model_path = f"./tmp/{uuid.uuid4().hex}-multilingual_ia3_gradient_checkpoint" predictor = MultiModalPredictor(label="label", path=new_model_path) predictor.fit(train_en_df_downsample, presets="multilingual", hyperparameters={ "model.hf_text.checkpoint_name": "google/flan-t5-xl", "model.hf_text.gradient_checkpointing": True, "model.hf_text.low_cpu_mem_usage": True, "optimization.efficient_finetune": "ia3_bias", "optimization.lr_decay": 0.9, "optimization.learning_rate": 3e-03, "optimization.end_lr": 3e-03, "optimization.max_epochs": 1, "optimization.warmup_steps": 0, "env.batch_size": 1, "env.eval_batch_size_ratio": 1 })
Describe the bug ImportError: /usr/local/lib/python3.8/dist-packages/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK3c107SymBool10guard_boolEPKcl
Expected behavior Successfully completed model training.
To Reproduce Use the autogluon 0.7.1/0.7.0/0.6.2/0.6.0 follow the url: https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/efficient_finetuning_basic.html#training-flan-t5-xl-on-single-gpu
Screenshots / Logs
/usr/local/lib/python3.8/dist-packages/apex/normalization/fused_layer_norm.py:364 in init │ │ │ │ 361 │ │ super().init() │ │ 362 │ │ │ │ 363 │ │ global fused_layer_norm_cuda │ │ ❱ 364 │ │ fused_layer_norm_cuda = importlib.import_module("fused_layer_norm_cuda") │ │ 365 │ │ │ │ 366 │ │ if isinstance(normalized_shape, numbers.Integral): │ │ 367 │ │ │ normalized_shape = (normalized_shape,) │ │ │ │ /usr/lib/python3.8/importlib/init.py:127 in import_module │ │ │ │ 124 │ │ │ if character != '.': │ │ 125 │ │ │ │ break │ │ 126 │ │ │ level += 1 │ │ ❱ 127 │ return _bootstrap._gcd_import(name[level:], package, level) │ │ 128 │ │ 129 │ │ 130 _RELOADING = {} │ │ in _gcd_import:1014 │ │ in _find_and_load:991 │ │ in _find_and_load_unlocked:975 │ │ in _load_unlocked:657 │ │ in module_from_spec:556 │ │ in create_module:1166 │ │ in _call_with_frames_removed:219 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ImportError: /usr/local/lib/python3.8/dist-packages/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK3c107SymBool10guard_boolEPKcl
Installed Versions
I resolved this issue. Just use cuda11.6.2 + autogluon0.7.0 with normal install everything is OK...
Bug Report Checklist
Describe the bug ImportError: /usr/local/lib/python3.8/dist-packages/fused_layer_norm_cuda.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK3c107SymBool10guard_boolEPKcl
Expected behavior Successfully completed model training.
To Reproduce Use the autogluon 0.7.1/0.7.0/0.6.2/0.6.0 follow the url: https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/efficient_finetuning_basic.html#training-flan-t5-xl-on-single-gpu
Screenshots / Logs
Installed Versions