dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.36k stars 481 forks source link

Segmentation fault (core dumped) while using nano_llm:r36.2 container on NVIDIA Jetson Orin 64GB #690

Closed wyhanz closed 1 month ago

wyhanz commented 1 month ago

First of all, I want to express my gratitude to the author for their contributions to the NVIDIA Jetson platform. It has greatly facilitated our exploration of NVIDIA Jetson applications.

Below are the hardware details of my device: image

I am attempting to deploy a relatively new vision-language model (VLM), which can be found here: https://huggingface.co/microsoft/Florence-2-large, on an NVIDIA Jetson Orin 64GB. I followed the example code provided in the link, and while running it in the nano_llm:r36.2 container, I noticed some missing dependencies like flash_attn. After installing the necessary packages, I encountered the following error during execution:

/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Fatal Python error: Segmentation fault

Current thread 0x0000ffff97cccca0 (most recent call first):
  Garbage-collecting
  File "/usr/lib/python3.10/dataclasses.py", line 432 in _create_fn
  File "/usr/lib/python3.10/dataclasses.py", line 588 in _repr_fn
  File "/usr/lib/python3.10/dataclasses.py", line 1044 in _process_class
  File "/usr/lib/python3.10/dataclasses.py", line 1175 in wrap
  File "/usr/lib/python3.10/dataclasses.py", line 1184 in dataclass
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/dataclasses.py", line 84 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 883 in exec_module
  File "<frozen importlib._bootstrap>", line 688 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1006 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1027 in _find_and_load
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/__init__.py", line 32 in <module>
  ...
Extension modules: zstandard.backend_c, charset_normalizer.md, numpy.core._multiarray_umath, numpy.core._multiarray_tests, torch._C, torch._C._fft, torch._C._linalg, torch._C._nn, torch._C._sparse, torch._C._special, PIL._imaging, yaml._yaml, sentencepiece._sentencepiece, google._upb._message (total: 27)
Segmentation fault (core dumped)

What could be causing this segmentation fault, and how can I successfully deploy this model on my setup?

wyhanz commented 1 month ago

It could potentially be due to an L4T version mismatch. However, I’m unsure how to downgrade to version 36.2. I could not find an option to select the L4T version in NVIDIA SDK Manager. I am currently using the latest version of SDK Manager (2.2.0). 😭

wyhanz commented 1 month ago

nano_llm:r36.4 works well for me! Thanks a lot. 😄