easytarget / esp32-cam-webserver

Expanded version of the Espressif ESP webcam
https://hackaday.io/project/168563-7-esp32-cam-example-expanded
GNU Lesser General Public License v2.1
1.31k stars 352 forks source link

support for multiclient streaming #51

Open TungstenE2 opened 4 years ago

TungstenE2 commented 4 years ago

Hi,

thx for the great project!

Any chance for having multiclient streaming enabled in your project?

See for reference: https://github.com/arkhipenko/esp32-cam-mjpeg-multiclient

br

easytarget commented 4 years ago

By co-incidence I was looking at that sketch earlier today and wondering If this sketch could do something similar. Well.. not co-incidence, I was checking out people who follow this, and saw it listed in your recent activity ;-)

It's an excellent suggestion; but the architecture they use to serve the stream is really different to how this sketch currently works; it's not a copy/paste/modify job to replace the stream handler. And they do not use face detection, further complicating things.

I have a vague idea that the existing stream handler could be made thread-safe and multiple handlers run via RTOS, and we have 2 cores in the ESP to leverage too, the bottleneck should be WiFi not CPU.. Especially if (as I believe) face recognition can be done before the image is copied to be returned to each stream client.

I'm going to tentitively peg this to V5, which is where I intend to strip face recognition anyway, and should make this easier to implement using the code in the repo from @arkhipenko

At present V5 is pure vaporware, I'm not sure if I will ever have time to get there, but I'm very open to PR's from anybody who can get multi-client streaming running on top of the existing code architecture.

arkhipenko commented 4 years ago

I had to move away from espressif's streaming implementation completely because their code uses "chunked" stream which VLC and Blynk video players choke on. I needed something that can play on multiple clients and be compatible with VLC/Blynk. As for use of cores - yes, and I have done some experimentation in this project: https://www.hackster.io/anatoli-arkhipenko/minecraft-interactive-do-not-enter-sword-sign-esp32-cam-cd1b07

Also, there are multiple approaches for using RTOS tasks here: https://github.com/arkhipenko/esp32-mjpeg-multiclient-espcam-drivers

TungstenE2 commented 4 years ago

for the moment I solved the issue for me, that I just call for a still picture every 1 sec and show it on my tablet UI. This I can do from several clients at the same time.

TodWulff commented 4 years ago

I had to move away from espressif's streaming implementation completely because their code uses "chunked" stream which VLC and Blynk video players choke on. I needed something that can play on multiple clients and be compatible with VLC/Blynk. ...

As info, just for another data point, I was able to get VLC to receive and display the stream using this build. It would not work on the example webcam sketch. https://i.imgur.com/O9JHfPL.jpeg

As such, I suspect that VLC can transcode this in a multicast construct. Granted, it is not ideal, but could be a viable workaround for those with a system topology that supports doing same.

Also, @easytarget, be advised that there are a couple of typos in the myconfig.sample.h file - seems you dropped the #define on the serial debug #def (Line ~139). And, on line 26, seems to be an extra 'station' word in the structure definition construct (yes, it being a comment is acknowledged. :).'

easytarget commented 4 years ago

Thanks, I didnt do a good job on merging myconfig up to the sample. Sigh... fixed.

I've very much focussed on this playing in a browser window, In keeping with it being an expanded example. The stream code is essentially untouched from the original. I'm kind of assuming that those who need rtsp streams etc would be looking at the more advanced projects focussing on specific security and home-automation. That said; a more solid streaming part of the code is in the plan; which is why this issue is open and active :wink:

TungstenE2 commented 3 years ago

push, any news on this topic?

TungstenE2 commented 3 years ago

@easytarget are you still working on this project? Please do not let it go down....

TungstenE2 commented 3 years ago

fyi, Tasmota added rtsp to the tasmota32-webcam.bin, so this added multiclient streaming to ESP32cam. May be this helps you as well?

https://github.com/arendst/Tasmota/issues/9293

Also a nice page in Tasmota for the cam, but missing some options: https://cgomesu.com/blog/Esp32cam-tasmota-webcam-server/

TungstenE2 commented 2 years ago

push ;-)

abratchik commented 1 year ago

Hi All. May be an old subject but just waned to understand the idea of the multi-client behind the proposed solution a bit better.

I have refactored this whole project in PR #280 and where the frame capture is timer-based. This works pretty well and the frames could be broadcasted to all the connected clients theoretically, this is pretty easy. However, there are several questions to be understood. For example, assume we have 2 clients connected simultaneously, for the sake of simplicity. Let's consider the following scenarios:

  1. One has started the video stream while the other decides to take a still image. We could take one frame and send to the 2nd guy, not a problem. However, flash lamp cannot be triggered in this case, because this will affect the streaming for the 1st one.
  2. Should we allow 2nd guy to change any settings (camera/frame rate etc) or not? I'd say no, since the 1st client is already running the stream and any changes will affect his experience. This means that all the settings for the 2nd client should not be possible while the stream is running.
  3. if the 1st client has started the stream, the second should only be able to connect and see the same image/stream, which is already served for the 1st one. No settings changes should be allowed and no flash photography either.
  4. If the 2nd client connects to the stream started by the 1st one, then the 1st one should also be blocked from doing any changes, which may impact the experience for the 2nd guy. This means the 1st client also should not be able to change any settings until and unless he is the only one served.
  5. If one of the clients disconnects, and the disconnection message delivery fails for some reason then there have to be some sort of client ping/auto-cleanup on timeout, otherwise the settings will remain locked for the 1st guy even after the 2nd one is dropped.

Overall, I believe implementing this feature in the ESP32CAM sketch doesn't worth the effort, for the following reasons:

  1. The solution will be complex and the overall user experience will not be as nice, since parallel clients will be competing for resources of the board, which cannot be shared effectively (i.e. flash lamp, camera sensor etc).
  2. Feeding the camera video stream for multiple clients can be implemented on a proxy server much more efficiently and serve as many parallel clients as required, subject to proxy server capacity. In this case, the ESP CAM will have to push the frames to one client only (the proxy server). Moreover, one could also implement a video compression on the proxy so that clients would receive an MPEG stream instead of frame series - this should help on the server bandwidth requirements quite significantly.

Would be great to hear your thoughts on this.

TungstenE2 commented 1 year ago

@abratchik thx for deep looking into this.

Speaking for myself I would love to see multiclient streaming from clients like tinyCam using MJPEG URL (http://:81). Currently I can only watch the stream from one client eg my tablet. A second client eg my mobile can not access the stream, unless the first client stopped the streaming. All my other webcams do support this.

I was not referring to the GUI and a second client/user is browsing the frontend and being able to change parameters in parallel.

Does this help and make sense?

abratchik commented 1 year ago

Hi @TungstenE2Thank you for your response and clarification. It does make sense, although I think proxying the stream would make more sense , since the number of parallel clients will not be limited by the ESP32 resources. I believe same is valid for any webcam  - if it the USB then the host plays the role of the proxy. In the new version, there is no second port, web socket is used instead, on the same port. There is no MJPEG either - dropped it in favor of timer-based frame generation and forwarding over the web socket. If the idea of "multi streaming" Is only to let the other clients to "see", what is being captured at the moment by the 1st client, this is possible of course.

TungstenE2 commented 1 year ago

Hi @abratchik, not sure what you mean by proxying the stream.

In my setup max 2-3 clients would call the stream in parallel. Currently one client can call the stream, a second client is not showing the stream, unless the first client stops the streaming.

To be honest I do not fully understand the way it is working now in the new version done by you. How can a client like TinyCam app on Android call the stream to display the video? Which protocoll is used instead of MJPEG in new version?

abratchik commented 1 year ago

Hi @TungstenE2

In theory, one could develop a websocket frame relay and deploy it on the proxy server. So the clients would connect to the proxy rather than to the ESP32 board directly. In this case, number of clients could be much higher than 2-3, and there is no need to modify the ESP32 sketch for that. This require the relay development, may not be very trivial. Doable though.

The way the new version is working is very simple. First, the client needs to open a websocket. This is done in JS code but can be any other language supporting WebSocket API.

Once the websocket connection is established, the client sends a command to the server to start the stream. The server then starts generating frames based on timer events (for example 25 frames per second) and pushes it to the client.

Since the websocket is a stateful protocol, it can consume the messages from the server and update it on the page. When the client doesn't want the frames anymore, it just sends a relevant command to the server or simply disconnects from the websocket - in this case the frame timer is stopped.

In a way, this is less cumbersome than MJPEG because there is no need for the client to request every frame. From the server side, there is also no need to handle chunked responses. All frames are generated by timer events, no delay() calls are used anywhere in the code, which is also a big plus, since the video streaming is not blocking any threads.

One can also allow multiple websocket clients and all of them will receive the same frame simultaneously, I just don't need it for my project.

TungstenE2 commented 1 year ago

@abratchik thx for explanation. Do you also plan to main or support the project after @easytarget integrated your refactoring?

@easytarget are you able to proceed with the project if @abratchik decides not to support in future anymore? Seems he has a certain project and multi clients support is not part of his project plan.