Closed edalquist closed 3 years ago
I was actually just working on trying to build this from the ground up for amd64. Would be nice to see
Note if I uninstall tensorflow it doesn't crash, just complains it couldn't find it:
(argos-venv) edalquist@argos:~/argos$ python stream.py --ip 0.0.0.0 --port 8080 --config configs.driveway_stream
INFO:__main__:package import START
INFO:__main__:package import END
INFO:notifier:mqtt init
INFO:__main__:flask init..
INFO:__main__:start reading video file
INFO:__main__:TFObjectDetector init START
* Serving Flask app "stream" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
INFO:werkzeug: * Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)
Exception in thread Thread-9:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
INFO:input.rtmpstream:rtmp capture init START
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/edalquist/argos/detection/detect_base.py", line 145, in detect_continuously
self.initialize_tf_model()
File "/home/edalquist/argos/detection/detect_base.py", line 40, in initialize_tf_model
from tflib.tflite_util import DetectorTFLite
File "/home/edalquist/argos/tflib/tflite_util.py", line 10, in <module>
from tensorflow.lite.python.interpreter import Interpreter
ModuleNotFoundError: No module named 'tensorflow'
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:input.rtmpstream:rtmp capture init END
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
^CTraceback (most recent call last):
File "stream.py", line 248, in <module>
t = sd.start()
File "stream.py", line 63, in start
self.od.wait_for_ready()
File "/home/edalquist/argos/detection/detect_base.py", line 49, in wait_for_ready
self.__cv.wait()
File "/usr/lib/python3.8/threading.py", line 302, in wait
waiter.acquire()
KeyboardInterrupt
FATAL: exception not rethrown
Aborted
Makes me think I'm not installing the right TF wheel.
@edalquist even if tensorflow isn't installed at all you'll just see a ModuleNotFoundError and argos will keep running, since the object detector is running in a separate thread. so just that thread crashes (and my bad - i haven't handled killing the whole process on the tensorflow thread crashing)
it doesn't matter that your underlying OS is ubuntu. it could have been OSX and you'd have faced the same error above. what matters is the cpu architecture (armh, amd64, x86). pip looks for architecture-specific wheels. for mainstream architectures (amd64, x86), you don't need to install a specific wheel of tensorflow. that workaround was the only way for raspberry pi armv6/7 architectures whose pip repositories did not contain tensorflow 2.x wheels yet.
you may just 'pip install tensorflow==2.4.0' on a mainstream architecture machine (or docker container). (e.g. thats what you need to do on your macbook pro as well)
also, the current Dockerfile in the repo is based from arm32v7/python:3.7-slim-buster
and instructions are put together specific to that. let me put together an x86_64 docker image.
FWIW the wheel should have been tensorflow-2.4.0-cp37-cp37m-manylinux2010_x86_64.whl, since we're using python3.7
Interesting, somehow I ended up on python 3.8, I guess that is the default "python3" for ubuntu 20.02.
I did pip install tensorflow==2.4.0
which ended up installing the same wheel I manually found and I get the same Illegal instruction
error.
I'll try recreating the venv but explicitly using 3.7
No luck with a Python 3.7 install either:
Python 3:
Interpreter: /opt/python/cp37-cp37m/bin/python (ver 3.7.9)
Libraries: libpython3.7m.a (ver 3.7.9)
numpy: /tmp/pip-build-env-7d0lu0w8/overlay/lib/python3.7/site-packages/numpy/core/include (ver 1.14.5)
install path: python
Same Illegal Instruction
error. Any tips on how to better debug where that error might be coming from?
alright, i just pushed 2 new x86_64
docker images - angadsingh/argos:x86_64
and angadsingh/argos:x86_64_gpu
, which are based on ubuntu itself as the base image (based on the tensorflow docker which is based on ubuntu). it works fine on my macbook. try using them (one is a cpu version and one supports using a nvidia GPU). updated the README.
https://hub.docker.com/repository/docker/angadsingh/argos/tags
example runs:
docker run --rm -p8081:8081 -v "/Users/asingh/workspace/pi object detection/argos/configs:/configs" -v "/Users/asingh/workspace/pi object detection/argos/detections:/output_detections" -v ~/.ssh:/root/.ssh angadsingh/argos:x86_64 /usr/src/argos/stream.py --ip 0.0.0.0 --port 8081 --config configs.config_tflite_ssd
INFO:__main__:package import START
INFO:__main__:package import END
INFO:paramiko.transport:Connected (version 2.0, client OpenSSH_7.9p1)
INFO:paramiko.transport:Authentication (publickey) successful!
INFO:__main__:flask init..
INFO:__main__:start reading video file
INFO:__main__:TFObjectDetector init START
* Serving Flask app "stream" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
INFO:input.rtmpstream:rtmp capture init START
INFO:werkzeug: * Running on http://0.0.0.0:8081/ (Press CTRL+C to quit)
2021-01-25 05:46:00.521889: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-01-25 05:46:00.521976: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:input.rtmpstream:rtmp capture init END
INFO:__main__:TFObjectDetector init END
INFO:__main__:detect_objects init..
INFO:detection.door_detect:door state changed: DoorStates.DOOR_CLOSED
INFO:detection.door_detect:motion state changed: MotionStates.NO_MOTION
INFO:lib.ha_webhook:DoorStates.DOOR_CLOSED
INFO:__main__:od=0.00/md=0.00/st=0.00 fps
INFO:detection.door_detect:stateHistory: [DoorStates.DOOR_CLOSED[0], MotionStates.NO_MOTION[0]]
INFO:lib.ha_webhook:MotionStates.NO_MOTION
INFO:detection.door_detect:stateHistory: [DoorStates.DOOR_CLOSED[1], MotionStates.NO_MOTION[1]]
INFO:detection.door_detect:stateHistory: [DoorStates.DOOR_CLOSED[2], MotionStates.NO_MOTION[2]]
INFO:__main__:od=0.00/md=5.00/st=76.00 fps
INFO:detection.door_detect:stateHistory: [DoorStates.DOOR_CLOSED[3], MotionStates.NO_MOTION[3]]
Pinsights-MacBook-Pro:argos asingh$ docker run --rm -p8081:8081 -v "/Users/asingh/workspace/pi object detection/argos/configs:/configs" -v "/Users/asingh/workspace/pi object detection/argos/detections:/output_detections" -v ~/.ssh:/root/.ssh angadsingh/argos:x86_64_gpu /usr/src/argos/stream.py --ip 0.0.0.0 --port 8081 --config configs.config_tflite_ssd
INFO:__main__:package import START
INFO:__main__:package import END
INFO:paramiko.transport:Connected (version 2.0, client OpenSSH_7.9p1)
INFO:paramiko.transport:Authentication (publickey) successful!
INFO:__main__:flask init..
INFO:__main__:start reading video file
INFO:__main__:TFObjectDetector init START
* Serving Flask app "stream" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
INFO:input.rtmpstream:rtmp capture init START
INFO:werkzeug: * Running on http://0.0.0.0:8081/ (Press CTRL+C to quit)
2021-01-25 05:52:31.608788: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:detection.door_detect:stateHistory: []
INFO:input.rtmpstream:rtmp capture init END
INFO:__main__:TFObjectDetector init END
INFO:__main__:detect_objects init..
INFO:detection.door_detect:door state changed: DoorStates.DOOR_CLOSED
INFO:detection.door_detect:motion state changed: MotionStates.NO_MOTION
INFO:lib.ha_webhook:DoorStates.DOOR_CLOSED
INFO:__main__:od=0.00/md=0.00/st=0.00 fps
INFO:lib.ha_webhook:MotionStates.NO_MOTION
INFO:detection.door_detect:stateHistory: [DoorStates.DOOR_CLOSED[0], MotionStates.NO_MOTION[0]]
Thanks! I'll get a docker vm setup tomorrow and give it a try.
regarding your issue. i think its this: https://github.com/tensorflow/tensorflow/issues/17411 https://stackoverflow.com/questions/49094597/illegal-instruction-core-dumped-after-running-import-tensorflow
what machine are you running this on? is it a NAS? your CPU might not have AVX instructions. tensorflow apparently uses them since 1.6 (but we need 2.x so cant downgrade to 1.5) and in that case you'll have to build tensorflow from source.
try out the different tensorflow docker versions from here and see which one works on your machine: https://hub.docker.com/r/tensorflow/tensorflow/tags (install an image and then just run python and do import tensorflow
)
Ah this is an old Xenon E5 server, I can try moving the container over to a Ryzen machine and see if it is happier.
That was it! Moved from a Xeon E5-2670 to a Ryzen 9 3900X and it works!
I'm trying to see if I can get this running in an LXC container running ubuntu server 20.02
The one change I made from the instructions is to install TF via:
When running stream.py I get:
Here is my OpenCV build info dump: