!cd ./FlexGen && python3 -m flexgen.flex_opt --model facebook/opt-1.3b
2023-02-21 15:25:58.653992: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-21 15:25:59.475428: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-21 15:25:59.475530: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-21 15:25:59.475548: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Downloading (…)okenizer_config.json: 100% 685/685 [00:00<00:00, 108kB/s]
Downloading (…)lve/main/config.json: 100% 651/651 [00:00<00:00, 111kB/s]
Downloading (…)olve/main/vocab.json: 100% 899k/899k [00:00<00:00, 6.53MB/s]
Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 3.32MB/s]
Downloading (…)cial_tokens_map.json: 100% 221/221 [00:00<00:00, 77.4kB/s]
model size: 2.443 GB, cache size: 0.398 GB, hidden size (prefill): 0.008 GB
warmup - init weights
Load the pre-trained pytorch weights of opt-1.3b from huggingface. The downloading and cpu loading can take dozens of minutes. If it seems to get stuck, you can monitor the progress by checking the memory usage of this process.
Downloading (…)lve/main/config.json: 100% 653/653 [00:00<00:00, 81.5kB/s]
Downloading (…)"pytorch_model.bin";: 100% 2.63G/2.63G [00:29<00:00, 88.2MB/s]
^C
!cd ./FlexGen && python3 -m flexgen.flex_opt --model facebook/opt-1.3b 2023-02-21 15:25:58.653992: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-02-21 15:25:59.475428: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-21 15:25:59.475530: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-21 15:25:59.475548: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Downloading (…)okenizer_config.json: 100% 685/685 [00:00<00:00, 108kB/s] Downloading (…)lve/main/config.json: 100% 651/651 [00:00<00:00, 111kB/s] Downloading (…)olve/main/vocab.json: 100% 899k/899k [00:00<00:00, 6.53MB/s] Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 3.32MB/s] Downloading (…)cial_tokens_map.json: 100% 221/221 [00:00<00:00, 77.4kB/s] model size: 2.443 GB, cache size: 0.398 GB, hidden size (prefill): 0.008 GB warmup - init weights Load the pre-trained pytorch weights of opt-1.3b from huggingface. The downloading and cpu loading can take dozens of minutes. If it seems to get stuck, you can monitor the progress by checking the memory usage of this process. Downloading (…)lve/main/config.json: 100% 653/653 [00:00<00:00, 81.5kB/s] Downloading (…)"pytorch_model.bin";: 100% 2.63G/2.63G [00:29<00:00, 88.2MB/s] ^C