FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

something wrong in the google colab #26

Open azoth07 opened 1 year ago

azoth07 commented 1 year ago

!cd ./FlexGen && python3 -m flexgen.flex_opt --model facebook/opt-1.3b 2023-02-21 15:25:58.653992: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-02-21 15:25:59.475428: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-21 15:25:59.475530: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-21 15:25:59.475548: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Downloading (…)okenizer_config.json: 100% 685/685 [00:00<00:00, 108kB/s] Downloading (…)lve/main/config.json: 100% 651/651 [00:00<00:00, 111kB/s] Downloading (…)olve/main/vocab.json: 100% 899k/899k [00:00<00:00, 6.53MB/s] Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 3.32MB/s] Downloading (…)cial_tokens_map.json: 100% 221/221 [00:00<00:00, 77.4kB/s] model size: 2.443 GB, cache size: 0.398 GB, hidden size (prefill): 0.008 GB warmup - init weights Load the pre-trained pytorch weights of opt-1.3b from huggingface. The downloading and cpu loading can take dozens of minutes. If it seems to get stuck, you can monitor the progress by checking the memory usage of this process. Downloading (…)lve/main/config.json: 100% 653/653 [00:00<00:00, 81.5kB/s] Downloading (…)"pytorch_model.bin";: 100% 2.63G/2.63G [00:29<00:00, 88.2MB/s] ^C