Open hongkongkiwi opened 5 years ago
Of course. We have been thinking about integrating Google Edge TPU or Nvidia Jetson. From their spec I think either of them will be able to run detection for at least 1 camera stream, or even more, in real time.
It hasn't been a priority since most of the users are just fine with running TSD server on a PC. But will be happy to collaborate if you want to take a stab at it.
If this were implemented, I could easily see the Jetson Nano becoming a viable upgrade over the Pi for printer hosting. Imagine, in some point in the future, if there were an SD image for the Jetson, similar to OctoPi, that came with TSD pre-installed and Octoprint pre-configured to integrate with it. That could be quite valuable for users wanting high-reliability printing without wanting to tether their setup to a full-size PC.
I had been doing some work in getting TSD running on a Jetson Nano. The current thing stopping me from finishing it the model.so and model_gpu.so not being compiled for ARM64. I would appreciate some assistance with getting these libraries recompiled for ARM64
@RaymondHimle I also tried to get TSD running on a Jetson Nano, but got some error while starting the server, even with your instruction. Do you already have a running version?
@Shad0wjump3r You can use https://github.com/TheSpaghettiDetective/TheSpaghettiDetective/blob/master/docs/jetson_guide.md as the starting point. If you run into any issues, you can join us in the forum: https://discord.gg/NcZkQfj
It looks like I a missed a step web/Dockerfile needs to be replaced as the the spaghetti detective web_base image was build for AMD64 https://gist.github.com/RaymondHimle/b17676f3bec95fe00ec85a551ee3325b
@hongkongkiwi did you get anywhere with trying this? I have been experimenting with object detection on an RPi4 + Coral USB with really great results - good enough to process live webcam video and detect objects using MobileNet SSD v2 (COCO). I too would be keen to trial whether or not it was possible to run TSD server on the edge using this combo!
Hey! Figured I'd throw my interest out there for this. I've worked with the Edge TPU myself and similar devices and think that they could definitely work really well for this use case, and could even help in full sized computers/servers in some cases, as there are also PCIe versions of the edge TPU available for not that much (~50 USD at this time).
There are a few issues/requirements that would need to be tackled first, as the Edge TPU has some specific requirements. First, all of the model compilation tooling requires TensorFlow (Lite). Second, there are some specific model requirements ala the model requirements page:
- Tensor parameters are quantized (8-bit fixed-point numbers; int8 or uint8).
- Tensor sizes are constant at compile-time (no dynamic sizes).
- Model parameters (such as bias tensors) are constant at compile-time.
- Tensors are either 1-, 2-, or 3-dimensional. If a tensor has more than 3 dimensions, then only the 3 innermost dimensions may have a size greater than 1.
- The model uses only the operations supported by the Edge TPU (see table 1 below).
There's also some performance considerations that are worth knowing about, as the Edge TPU has ~8MB of SRAM Cache for parameters. Read more here
Overall I think this would be a super cool addition and one that I'd love to tinker with, but since we don't have access to the source of the model itself (if it's even written in TF, which admittedly might be said somewhere I didn't see) a lot of the heavy lifting (e.g. quantization/requirement verification) would likely have to be done by the TSD Team. It has a +1 interest from me for sure, but isn't something that can be implemented without the TSD team (I'm open to collaboration if you guys are interested!)
Thank you for offering to help @ben-hawks ! I'll be happy to work with you on this.
TSD's ML model is not based on tensorflow. It is instead using darknet. The model file that is open source is available at the url defined in this file: https://github.com/TheSpaghettiDetective/TheSpaghettiDetective/blob/master/ml_api/model/model.weights.url
Let me know if there is anything TSD team can help you to get this done!
Ahhh, that's what I get for commenting at 2am! Thanks for clarifying, I haven't worked much with darknet before, but it seems like there are some tools out there for conversion from darknet -> Keras/TF, so I can try giving that a shot since that'd be the first step to getting it running on the Edge TPU.
keep us in the loop @ben-hawks - happy to donate some time to test if you manage to get it working with TF
Is there any update on this? I‘m really interested in this feature too as it offers a low power solution.
If there is an update on this it will be posted. Lets keep the comments related to the issue.
I am also very interested in using a coral TPU with obico. Am I correct the detection is called in https://github.com/TheSpaghettiDetective/obico-server/blob/master/ml_api/server.py#L42 ? Therefor "only" this line needs to be changed to work with pycoral ?
I am also very interested in using a coral TPU with obico. Am I correct the detection is called in https://github.com/TheSpaghettiDetective/obico-server/blob/master/ml_api/server.py#L42 ? Therefor "only" this line needs to be changed to work with pycoral ?
Correct. Feel free to submit a PR if you want to take a stab at it.
@kennethjiang are we still using darknet-based yolo? There are many modern implementations which are a bit easier to run on other platforms. I think it should be possible to run detector on a single core of Rpi4.
@e-fominov Feel free to convert our model to a more modern model and send a PR. :)
Our model has about 50M parameters. I can see how a much-simpler model can run on RPi CPU. But I will have to see to believe that it can handle any model that has more than, say, just 10M parameters.
How many frames per second do we need for a stable operation?
1 frame per 10s.
I just tried ONNX implementation with original model on RPI4 and it works for 1.1 sec inference time per frame.
I'm also interested in Google Edge TPU to support. Currently running Edge TPU in my server for Frigate.
Hi TSD Team,
I would like to know if this project can work with Google EDGE TPU.
As I understand it, the RPI is not powerful enough to do machine learning on it's own for this kind of model, however Google makes a custom hardware accelerator specifically for this application (e.g. running via a Pi) and specifically for Tensorflow. It's very reasonable around $80 and I was thinking this project would be an excellent candidate for it.
Has this been investigated by the team? If not, would you be willing to work together to support it?