Load tests - Githubissues

jvwong commented 2 years ago

I have a branch load_test with a simple load.py which times the predict on different numbers of articles. Below is a matrix of a) number of texts b) platforms:

OS/Platform	GPU	# Records	Elapsed time (s)	Memory (GB)
MacOS, 8-Core i9		100	31	2.7
		1000	298	2.7
*Ubuntu 18.04.5 LTS, Intel(R) Xeon(R) CPU E5-2687W, 24 Core	NVIDIA GP102 [TITAN Xp]	100	28	2.6
		1000	250	2.6

This is the work station running semantic-search (X.X.X.150)

jvwong commented 2 years ago

@JohnGiorgi I don't think it's picking up the GPU, at least with the simple script load.py

My sense is the collab notebook example runs ~1000 texts in about 30s, which is an order magnitude difference

JohnGiorgi commented 2 years ago

Weird, if you call nvidia-smi does it pick up the Titan XP?

JohnGiorgi commented 2 years ago

Try calling in your enviornment

import tensorflow as tf

assert tf.test.is_gpu_available()
assert tf.test.is_built_with_cuda()

In colab these checks pass. I also called nvidia-smi

and can see there's a GPU available.

jvwong commented 2 years ago

The nvidia-smi thing works fine. However, the assertions fail:

>>> import tensorflow as tf
>>> assert tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2022-05-12 10:51:04.181343: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-12 10:51:04.259859: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-05-12 10:51:04.259889: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

JohnGiorgi commented 2 years ago

Hmm, looks like CUDA is not installed properly if I had to guess. I believe if you do everything via conda it will install the neccecary libraries for using the GPU. See: https://towardsdatascience.com/setting-up-tensorflow-gpu-with-cuda-and-anaconda-onwindows-2ee9c39b5c44

jvwong commented 2 years ago

OK it was missing some library cudnn:

jvwong commented 2 years ago

OS/Platform	GPU	# Records	Elapsed time (s)	Memory (GB)
MacOS, 8-Core i9		100	31	2.7
		1000	298	2.7
*Ubuntu 18.04.5 LTS, Intel(R) Xeon(R) CPU E5-2687W, 24 Core	NVIDIA GP102 [TITAN Xp]	1000	27	1.9

This is the work station running semantic-search (X.X.X.150)

PathwayCommons / pathway-abstract-classifier

Load tests #41