Running Text Chat and Multimodal Chat tries to open shared library in wrong file path

bryanhughes commented 7 months ago

The container fails with the following error:

OSError: MLC couldn't find /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/params/Llama-2-7b-chat-hf-q4f16_ft-cuda.so

In more detail:

Using path "/data/models/mlc/dist/models/Llama-2-7b-chat-hf" for model "Llama-2-7b-chat-hf"
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Get old param:   0%|                                                                                                                                                   | 0/197 [00:00<?, ?tensors/sStart computing and quantizing weights... This may take a while.                                                                                                        | 0/327 [00:00<?, ?tensors/s]
Get old param:  98%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉  | 194/197 [01:09<00:01,  2.55tensors/sFinish computing and quantizing weights.█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌| 326/327 [01:09<00:00,  8.45tensors/s]
Total param size: 3.1569595336914062 GB
Start storing to cache /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/params
[0327/0327] saving param_326
All finished, 99 total shards committed, record saved to /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/params/ndarray-cache.json█████████████████████████| 327/327 [01:20<00:00,  8.45tensors/s]
Finish exporting chat config to /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/params/mlc-chat-config.json
Save a cached module to /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/mod_cache_before_build.pkl.
Finish exporting to /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/Llama-2-7b-chat-hf-q4f16_ft-cuda.so
20:57:54 | INFO | device=cuda(0), name=Orin, compute=8.7, max_clocks=1300000, multiprocessors=8, max_thread_dims=[1024, 1024, 64], api_version=12020, driver_version=None
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/local_llm/local_llm/__main__.py", line 22, in <module>
    model = LocalLM.from_pretrained(
  File "/opt/local_llm/local_llm/local_llm.py", line 72, in from_pretrained
    model = MLCModel(model_path, **kwargs)
  File "/opt/local_llm/local_llm/models/mlc.py", line 82, in __init__
    raise IOError(f"MLC couldn't find {self.module_path}")
OSError: MLC couldn't find /data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/**params**/Llama-2-7b-chat-hf-q4f16_ft-cuda.so

Looking at the /data directory that is mounted in the container, I do find the shared library:

bryan@mimzy-jetson:~/git/jetson-containers$ ll data/models/mlc/dist/Llama-2-7b-chat-hf-q4f16_ft/
total 88572
-rwxr-xr-x 1 root root 37986272 Feb 20 12:57 Llama-2-7b-chat-hf-q4f16_ft-cuda.so*
-rw-r--r-- 1 root root 52705785 Feb 20 12:57 mod_cache_before_build.pkl
drwxr-xr-x 2 root root     4096 Feb 20 12:55 params/

The python code seems to be using the wrong path to the shared library. Not exactly sure where self.module_path is set.

dusty-nv commented 7 months ago

@bryanhughes I have been migrating to the newer model builder workflow in MLC, and this issue should have been fixed last night in commit https://github.com/dusty-nv/jetson-containers/commit/8fc4f936b3057e1a8c96f0225d1ef317f5ad49d0 and the latest dustynv/local_llm:r36.2.0 container image - when did you last pull it?

EDIT: due to discrepancy in the line numbers from your stack traceback vs the latest source, it would appear you aren't running the latest container image

https://github.com/dusty-nv/jetson-containers/blob/8fc4f936b3057e1a8c96f0225d1ef317f5ad49d0/packages/llm/local_llm/models/mlc.py#L78

bryanhughes commented 7 months ago

@dusty-nv Thanks! Yeah, I just pulled a new container and now it works, except now it looks like the demo is missing the file path.jpg in the data directory.

21:49:00 | INFO | resetting chat history
>> PROMPT: /data/images/path.jpg

>> PROMPT: what does the sign say?

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/local_llm/local_llm/__main__.py", line 92, in <module>
    embedding, position = chat_history.embed_chat()
  File "/opt/local_llm/local_llm/history.py", line 304, in embed_chat
    entry[embed_key] = self.embed(entry[key], type=key, template=role_template)
  File "/opt/local_llm/local_llm/history.py", line 190, in embed
    return self.embedding_functions[type].func(input, template)
  File "/opt/local_llm/local_llm/history.py", line 243, in embed_image
    embeddings.append(self.model.embed_image(image, return_tensors='np'))
  File "/opt/local_llm/local_llm/local_llm.py", line 124, in embed_image
    embedding = self.vision(image, crop=crop, hidden_state=self.config.mm_vision_select_layer)
  File "/opt/local_llm/local_llm/vision/clip_hf.py", line 58, in __call__
    image = load_image(image)
  File "/opt/local_llm/local_llm/utils/image.py", line 45, in load_image
    image = PIL.Image.open(path).convert('RGB')
  File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 3247, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/data/images/path.jpg'
bryan@mimzy-jetson:~/git/jetson-containers$ ll data/images/
dogs.jpg          fruit.jpg         .gitkeep          hoover.jpg        lake.jpg          stable-diffusion/

dusty-nv commented 7 months ago

Ahh thanks @bryanhughes, added that image to the repo in https://github.com/dusty-nv/jetson-containers/commit/d8992335108db11b4e003db0d4cf03cf2a1cb5b6 and merged these recent changes into master. You don't need to rebuild the container to get path.jpg (just run git pull on your jetson-containers repo), because it's under the jetson-containers/data dir and that gets mounted into the containers (for storing your models, test images, ect)

bryanhughes commented 7 months ago

I did a git pull origin master and did not see the file path.jpg. Here is the directly listing of jetson-containers/data

bryan@mimzy-jetson:~/git/jetson-containers$ ll data/images/
dogs.jpg          fruit.jpg         .gitkeep          hoover.jpg        lake.jpg          stable-diffusion/

dusty-nv commented 7 months ago

https://github.com/dusty-nv/jetson-containers/blob/master/data/images/path.jpg

bryanhughes commented 7 months ago

DOH. Gotta love technology. I just did another pull and got several changes.

Thanks! I will close this one now.

dusty-nv / jetson-containers

Running Text Chat and Multimodal Chat tries to open shared library in wrong file path #396