Open wangleiofficial opened 2 years ago
Hi @wangleiofficial, I met the problem same with you. Do you fix it?
@Line290 Not yet,i guess the part parameters(Pretrained model) are not handled correctly.
I've got the same problem, any fixes yet?
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions - the Lightning Team!
+1, deepspeed_stage2 meets the same error.
Similar problem with deepspeed_stage_1.
File "python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2019, in backward self.loss_scaler.backward(loss.float(), retain_graph=retain_graph) File "python3.9/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward scaled_loss.backward(retain_graph=retain_graph) File "python3.9/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "python3.9/site-packages/torch/autograd/init.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Found dtype Float but expected Half
Bug description
When using mixed precision with Deepspeed, the model resulted in the error:
RuntimeError: expected scalar type Float but found Half
.How to reproduce the bug
Error messages and logs
Environment
Current Environment
``` * CUDA: - GPU: - GeForce RTX 3090 - GeForce RTX 3090 - GeForce RTX 3090 - GeForce RTX 3090 - GeForce RTX 3090 - GeForce RTX 3090 - GeForce RTX 3090 - GeForce RTX 3090 - available: True - version: 11.3 * Lightning: - pytorch-lightning: 1.6.5 - torch: 1.11.0 - torchaudio: 0.11.0 - torchinfo: 1.7.0 - torchmetrics: 0.10.0 - torchvision: 0.12.0 * Packages: - absl-py: 1.0.0 - aiohttp: 3.8.1 - aiosignal: 1.2.0 - asttokens: 2.0.5 - async-timeout: 4.0.2 - attrs: 21.4.0 - backcall: 0.2.0 - biopython: 1.79 - brotlipy: 0.7.0 - cached-property: 1.5.2 - cachetools: 5.0.0 - certifi: 2022.6.15 - cffi: 1.14.4 - charset-normalizer: 2.1.0 - click: 8.1.3 - cryptography: 37.0.2 - cycler: 0.11.0 - decorator: 5.1.1 - deepspeed: 0.6.6 - deprecated: 1.2.13 - distlib: 0.3.4 - docker-pycreds: 0.4.0 - einops: 0.4.0 - executing: 0.8.3 - fair-esm: 0.4.2 - fairscale: 0.4.6 - filelock: 3.7.0 - fonttools: 4.29.1 - frozenlist: 1.3.0 - fsspec: 2022.2.0 - future: 0.18.2 - gitdb: 4.0.9 - gitpython: 3.1.27 - google-auth: 2.6.0 - google-auth-oauthlib: 0.4.6 - grpcio: 1.44.0 - h5py: 3.6.0 - hjson: 3.0.2 - huggingface-hub: 0.6.0 - idna: 3.3 - importlib-metadata: 4.11.2 - infinibatch: 0.1.0 - ipython: 8.1.0 - jedi: 0.18.1 - joblib: 1.1.0 - kiwisolver: 1.3.2 - lmdb: 1.3.0 - lxml: 4.8.0 - markdown: 3.3.6 - matplotlib: 3.5.1 - matplotlib-inline: 0.1.3 - mkl-fft: 1.3.1 - mkl-random: 1.2.2 - mkl-service: 2.4.0 - multidict: 6.0.2 - ninja: 1.10.2.3 - numpy: 1.22.3 - oauthlib: 3.2.0 - packaging: 21.3 - pandas: 1.4.2 - parso: 0.8.3 - pathtools: 0.1.2 - pexpect: 4.8.0 - pickleshare: 0.7.5 - pillow: 9.1.1 - pip: 22.1.2 - platformdirs: 2.5.2 - plip: 2.2.2 - promise: 2.3 - prompt-toolkit: 3.0.28 - protobuf: 3.19.4 - psutil: 5.9.0 - ptyprocess: 0.7.0 - pure-eval: 0.2.2 - py-cpuinfo: 8.0.0 - pyasn1: 0.4.8 - pyasn1-modules: 0.2.8 - pycparser: 2.21 - pydantic: 1.9.1 - pydeprecate: 0.3.1 - pygments: 2.11.2 - pyopenssl: 22.0.0 - pyparsing: 3.0.7 - pysocks: 1.7.1 - python-dateutil: 2.8.2 - pytorch-lightning: 1.6.5 - pytz: 2022.1 - pyyaml: 6.0 - redis: 4.3.1 - regex: 2022.4.24 - requests: 2.28.1 - requests-oauthlib: 1.3.1 - rsa: 4.8 - scikit-learn: 1.1.1 - scipy: 1.8.0 - sentencepiece: 0.1.97 - sentry-sdk: 1.5.12 - setproctitle: 1.2.3 - setuptools: 62.6.0 - shortuuid: 1.0.9 - six: 1.16.0 - smmap: 5.0.0 - stack-data: 0.2.0 - tensorboard: 2.8.0 - tensorboard-data-server: 0.6.1 - tensorboard-plugin-wit: 1.8.1 - threadpoolctl: 3.1.0 - tokenizers: 0.12.1 - torch: 1.11.0 - torchaudio: 0.11.0 - torchinfo: 1.7.0 - torchmetrics: 0.10.0 - torchvision: 0.12.0 - tqdm: 4.63.0 - traitlets: 5.1.1 - transformers: 4.21.2 - triton: 1.0.0 - typing-extensions: 4.3.0 - urllib3: 1.26.9 - virtualenv: 20.14.1 - wandb: 0.12.16 - wcwidth: 0.2.5 - werkzeug: 2.0.3 - wheel: 0.37.1 - wrapt: 1.14.1 - yarl: 1.7.2 - zipp: 3.7.0 * System: - OS: Linux - architecture: - 64bit - ELF - processor: x86_64 - python: 3.8.0 - version: #1 SMP Thu Nov 8 23:39:32 UTC 2018 ```More info
No response
cc @awaelchli