akucia / analog-watch-recognition

Reading time from analog clocks
MIT License
23 stars 2 forks source link

Something makes cell six die in demo notebook. #2

Open ScriptHound opened 2 years ago

ScriptHound commented 2 years ago

After this line cell dies idle: https://github.com/akucia/analog-watch-recognition/blob/4321541fd0e184b5a0cf6470c1f8a42eb3cf8099/watch_recognition/watch_recognition/predictors.py#L46

There are many warnings btw: traceback.txt

akucia commented 2 years ago

This part looks like an issue with CUDA GPU libraries

2022-03-28 12:54:27.132131: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-28 12:54:27.132153: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Segmentation Models: using `tf.keras` framework.
extracting effnet-b3-FPN-160-tversky-hands.tar.gz
2022-03-28 12:54:30.458265: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/scripthound/analog_watch_recognition/analog-watch-recognition/venv/lib/python3.8/site-packages/cv2/../../lib64:
2022-03-28 12:54:30.458286: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-03-28 12:54:30.458298: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (magi3): /proc/driver/nvidia/version does not exist

Make sure you can run any tf code on the GPU

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

These tutorials might be helpful: https://www.tensorflow.org/install/gpu https://www.tensorflow.org/install/gpu

Alternatively, you could run the notebook without a GPU if you set env var CUDA_VISIBLE_DEVICES=-1 before you import Tensorflow. See more here https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars

akucia commented 2 years ago

Warnings like this

WARNING:tensorflow:Unable to restore custom metric. Please ensure that the layer implements `get_config` and `from_config` when saving. In addition, please use the `custom_objects` arg when calling `load_model()`.

and similar to these

WARNING:absl:Importing a function (__inference_block4a_expand_activation_layer_call_and_return_conditional_losses_4624933) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.
WARNING:absl:Importing a function (__inference_block2a_activation_layer_call_and_return_conditional_losses_4594619) with ops with unsaved custom gradients. Will likely fail if a gradient is requested.

Can be ignored if you use models for inference. The models might fail if you run .fit method though.