QuantumEntangledAndy / neolink

An RTSP bridge to Reolink IP cameras
GNU Affero General Public License v3.0
257 stars 41 forks source link

Reaching limit of channel | Remaining: 0 of 100 message space for 4 (ID: 3) #143

Closed apedance closed 2 months ago

apedance commented 10 months ago

Describe the bug neolink camera stops working after x amount of time with error:

[2023-08-28T22:44:50Z WARN neolink_core::bc_protocol::connection::bcconn] Reaching limit of channel [2023-08-28T22:44:50Z WARN neolink_core::bc_protocol::connection::bcconn] Remaining: 0 of 100 message space for 4 (ID: 3)

hard reset of neolink is needed to get the camera working again.

QuantumEntangledAndy commented 10 months ago

Duplicate of #117 I think

apedance commented 10 months ago

Indeed. But not for the media part. Not having that issue. Just the reaching limit of channel. Any idea how to solve?

edit: in the making. refactor.

QuantumEntangledAndy commented 10 months ago

Right but the (ID: 3) is a media message so it should be the same thing

QuantumEntangledAndy commented 10 months ago

As far as I can tell it is because the media part of the channel is not pulling the messages off of the stream fast enough. The whole part of the media code is quite complex so I am working on a refractor to hopefully remove this complexity and make it easier to pull messages off of the queue

jamesahendry commented 10 months ago

I have 6 cams on Neolink. I've tried Windows environments, Linux. The consistent theme I've come to is that they work much better if you only have 1 cam per instance. I get far less drop-outs on Frigate and the Frigate logs are clear. I've now separated out a Debian LXC on proxmox for each cam. Once you have the first one established, just copy it, edit TOML and away you go. They've only been up a day on their LXC, but they seem to behave very differently in terms of their resource consumption. Some will gradually fill the memory on the LXC throughout the day, some are quicker. Is there any recommended specs? I'm running each container with 512GB mem and 2VCPU. The VCPUs sit at next to nothing (under 3%) generally

image image

QuantumEntangledAndy commented 10 months ago

This should now be fixed on latest, can anyone confirm?

MicheleCardamone commented 10 months ago

Hi. Same error for me, with the build you sent me to fix rtsp error 404. After few hours of correct functioning i have this error.

it seems to be very similar to the previous timed out error, with the difference that some cams present the rtsp 503 error image

then trying to close and reopen neolink, a series of gstreamer errors appear. image

QuantumEntangledAndy commented 10 months ago

So this is quite odd. This warning message occurs when 100 messages are in the queue. Meaning we got 100 messages from the camera but the app has not processed them.

The odd thing is that there is no real processing going on. The app pulls a message converts it to bcmedia then instantly puts it in a broadcast stream with no waiting around.

The only part I can think of that might be causing this is if the conversion from bc->bcmedia is somehow stalling the process. So in that regards I made this build that uses a different (and hopefully less error prone) way of filtering and converting bc packets into bcmedia packets. Could you test it?

MicheleCardamone commented 10 months ago

So this is quite odd. This warning message occurs when 100 messages are in the queue. Meaning we got 100 messages from the camera but the app has not processed them.

The odd thing is that there is no real processing going on. The app pulls a message converts it to bcmedia then instantly puts it in a broadcast stream with no waiting around.

The only part I can think of that might be causing this is if the conversion from bc->bcmedia is somehow stalling the process. So in that regards I made this build that uses a different (and hopefully less error prone) way of filtering and converting bc packets into bcmedia packets. Could you test it?

Sure, as soon as I have time today I will do it and I'll update you!

MicheleCardamone commented 10 months ago

So this is quite odd. This warning message occurs when 100 messages are in the queue. Meaning we got 100 messages from the camera but the app has not processed them.

The odd thing is that there is no real processing going on. The app pulls a message converts it to bcmedia then instantly puts it in a broadcast stream with no waiting around.

The only part I can think of that might be causing this is if the conversion from bc->bcmedia is somehow stalling the process. So in that regards I made this build that uses a different (and hopefully less error prone) way of filtering and converting bc packets into bcmedia packets. Could you test it?

I can't download it, it says it doesn't exist

QuantumEntangledAndy commented 10 months ago

Seems I missed the 0 at the end of the link. Here is the right one

MicheleCardamone commented 10 months ago

Seems I missed the 0 at the end of the link. Here is the right one

Tested, all cam seems to work but just started it gives errors in gstream Screenshot 2023-09-20 143729

QuantumEntangledAndy commented 10 months ago

Ah that error, it was part of my experiment with buffering in gstreamer, it is already fixed in latest

QuantumEntangledAndy commented 10 months ago

I think you can use this build to confirm its fixed

MicheleCardamone commented 10 months ago

I think you can use this build to confirm its fixed

This build is being tested and hasn't given any errors so far, but it seems (particularly in the mainstream) to give this saturation type effect. I don't know if it could depend on the FPS, which as I was saying, often drops dramatically to 10 or less.

Screenshot 2023-09-20 181601

QuantumEntangledAndy commented 10 months ago

This isn't saturation it's an incomplete video frame. These usually happen across reconnect boundaries or at the start of a stream

QuantumEntangledAndy commented 10 months ago

What I can do is make neolink wait for a key frame before sending anything onwards between resets but this will mean that some frames will be skipped while waiting for a key frame

QuantumEntangledAndy commented 10 months ago

So build with the filtered for first keyframe is here at this point though perhaps we should test if these lots of reconnects are happening because of network congestion. Have you tried things like reducing the bitrate and fps on the cameras?

MicheleCardamone commented 10 months ago

So build with the filtered for first keyframe is here at this point though perhaps we should test if these lots of reconnects are happening because of network congestion. Have you tried things like reducing the bitrate and fps on the cameras?

Good morning. the latest build I was testing this morning seems to have failed. I didn't find any errors, the neolink process simply closed. a bit strange. However the bitrate is 4096 on the main stream with 20 fps

QuantumEntangledAndy commented 10 months ago

I am not observing this close without errors issue that you have. Also perhaps we should do this on another thread since it seems to not be related to this issue

MicheleCardamone commented 10 months ago

I am not observing this close without errors issue that you have. Also perhaps we should do this on another thread since it seems to not be related to this issue

Ok, I will open another thread

MicheleCardamone commented 10 months ago

Screenshot 2023-09-21 171917 Hi! same problem with the latest build you posted too @QuantumEntangledAndy

QuantumEntangledAndy commented 9 months ago

That's a shame how long does it take to get to this state? Do you still get to it even with 1 camera? I ask because if I want to replicate and trace this issue I might need to make a wireshark dump but I only have 2 cameras and wireshark won't play nicely over hours of connectiity

QuantumEntangledAndy commented 9 months ago

Does this error correct itself? I have a watch dog that looks for 3s of no video and then restarts it. This should look like 3s of no video for the watch dog which should mean that the stream will restart and you will get 100 message space again

QuantumEntangledAndy commented 9 months ago

If you can turn on debug log you can see when the watchdog kicks the stream. Set the env varRUST_LOG to debug

MicheleCardamone commented 9 months ago

That's a shame how long does it take to get to this state? Do you still get to it even with 1 camera? I ask because if I want to replicate and trace this issue I might need to make a wireshark dump but I only have 2 cameras and wireshark won't play nicely over hours of connectiity

I didn't count it exactly, but the time is certainly more than 6 hours of correct functioning. There seems to have been a slightly different error, still relating to this typology.image

However, no, it doesn't seem to resolve itself. I always have to restart neolink.

could I try creating 2 separate instances with different ports, one with 8 cams and one with just one cam, to see what happens? @QuantumEntangledAndy

QuantumEntangledAndy commented 9 months ago

It look like you ran out of RAM could you check that? Maybe show a graph of ram over time

MicheleCardamone commented 9 months ago

It look like you ran out of RAM could you check that? Maybe show a graph of ram over time

I haven't had a chance to check these days. I think I'll be able to let you know tomorrow. Has 0.6.1 fixed any of my problems from what you know? THANKS! Mc @QuantumEntangledAndy

QuantumEntangledAndy commented 9 months ago

0.6.1 is mostly an mqtt fix but it does include the other fixes we did with regards to BI start ups. You can see a summary of what's included in the release page

MicheleCardamone commented 9 months ago

It look like you ran out of RAM could you check that? Maybe show a graph of ram over time

Screenshot 2023-09-26 181235 testing with 0.6.2 Everything seems to work fine for the first 9 hours. However, I notice that the neolink task manager seems to consume more and more RAM, to restore it just restart the process. It is currently consuming around 5GB and increasing more and more

QuantumEntangledAndy commented 9 months ago

Yes I got rid of the large mem growth in 0.6.2 but there is a little bit left. I have some more fixes in the work that takes it down to zero at least for my macOS and Linux (can't test the windows) hopefully that will fix it.

MicheleCardamone commented 9 months ago

Yes I got rid of the large mem growth in 0.6.2 but there is a little bit left. I have some more fixes in the work that takes it down to zero at least for my macOS and Linux (can't test the windows) hopefully that will fix it.

Hi! @QuantumEntangledAndy unfortunately the previous problem seems to have recurred again with 0.6.2 master, I only notice that the error starts appearing after many more hours than in previous versions and that a greater quantity of errors appears. The symptom is a general lowering of the cam fps. Unlike other builds which completely stopped working. I don't know if this could depend on the RAM which is gradually being occupied more and more.image image

kniteli commented 8 months ago

I think this might be partially on the hardware side too. Neolink drops the camera when this starts happening, yes, but I even see a lot frame dropping and general instability on the stream through reolink too, only on the one device so far. E1 if it helps.

alenzurfluh commented 6 months ago

Hi, are you guys still facing this issue?

I'm using a Reolink Lumus and seeing simliliar issues here:

[2024-01-09T23:35:52Z WARN fcm_push_listener::listener] Connection failed. Retrying. [2024-01-09T23:39:32Z WARN neolink_core::bc_protocol::connection::bcconn] Reaching limit of channel [2024-01-09T23:39:32Z WARN neolink_core::bc_protocol::connection::bcconn] Remaining: 100 of 100 message space for 5 (ID: 3) [2024-01-09T23:39:39Z WARN neolink_core::bc_protocol::connection::bcconn] Reaching limit of channel [2024-01-09T23:39:39Z WARN neolink_core::bc_protocol::connection::bcconn] Remaining: 69 of 100 message space for 5 (ID: 3) Killed

QuantumEntangledAndy commented 2 months ago

Haven't had any reports on this error in quite some time, so I will be closing it. Let me know if this is still happening on latest