LX3JL / xlxd

HAM radio multiprotocol dstar reflector server
GNU General Public License v3.0
163 stars 111 forks source link

fix frame sequence for streams initiated on different mode #213

Closed narspt closed 1 year ago

narspt commented 2 years ago

This small patch fixes frame sequence applied by UpdatePids() (i.e. for streams initiated on different mode), m_uiPacketCntr should not be incremented by dv header packet else sequence for dv frames will start at 1 instead of 0 as it should.

This caused even further issues for DMR/YSF: with m_uiDmrPacketSubid and m_uiYsfPacketSubId also being calculated from the above, actually if we have for example a stream initiated on D-Star, 1st and 2nd dv frames would be transcoded to 2nd and 3rd AMBE frames of the DMR "triplet", the 1st AMBE frame would be the one stored at m_StreamsCache from the previous stream! and... frame sequence on EncodeDvPacket() being picked from that DvFrame0 (the one from previous stream), then something random...

The wrong frame sequence caused all DMR streams (when initiated on different mode, else original sequence is preserved) to show up on MMDVMHost as having packet loss.

(m_uiYsfPacketFrameId and m_uiImrsPacketFrameId are adjusted to match the change, as these should remain 0 for header packet and 1, 2, ... for dv packets)

K2IE commented 2 years ago

Trying on XLX020, seems to be working nicely so far.

F4FXL commented 1 year ago

Does this fix the error where ambed shows following errors Feb 7 06:44:49 xlxtranscode ambed[7539]: 0 of 0 packets lost and xlxd this error ? Feb 6 19:17:49 vps-46ac4de0 xlxd: ambed 71 of 38 packets timed out

narspt commented 1 year ago

No, not related at all to the fix in this PR... but... concerning "your" problem, I did check it before and actually the code to count ambed timedout packets does that in a very wrong way, if you look at ccodecstream.cpp, at end of CCodecStream::Task() you can see what it does is just: if not getting packets from ambed after timeout (400ms) then it just starts incrementing the supposed number of timedout packets like crazy every 5ms (due to the 5ms timeout of m_Socket.Receive above), without taking in account the actual number of packets sent to ambed and not even dropping (or sending them back as-is without transcoding) these "waiting for transcoding" packets from queue, but just counting again and again and again... then you get a very inflated and false number of timedout packets... Latency stats are also fully false values as all it does is calc time difference from last packet sent to ambed to some packet being received from ambed, not meaning at all that both are same packet... Also if I remember correctly if you have considerable latency between xlxd and ambed then for very short transmissions (just quick keying ptt) you will always get timedout packets count and the infamous -1000 on stats, it's "normal" due to stream being closed even before getting anything back from ambed. It's not easy to make it fully perfect because we can't easily rely on a packet id preserved after transcoding but I was working on an improvement, I did never submit it as a PR as I got no much time to fully test it and there was still something I wanted to improve but if you want to look/test check this branch: https://github.com/narspt/xlxd/commits/patch-21 (note that changes there will also bypass transcoding for timedout packets instead of freezing stream, so that users on the same mode can still communicate if ambed stops replying)

F4FXL commented 1 year ago

Thanks for your feedback. I would at least be happy if xlxd would recover from a timeout situation. Once a timeout situation happened it is completely broken. Most of our DStar clients connect using DCS protocol. As soon as the timeout happens DCS to DCS is broken.

To test the timeout fix I actually need to cherry pick these 2 commits, correct? https://github.com/narspt/xlxd/commit/d204bb61e570670aa9238ba0364d76849dfb50ce https://github.com/narspt/xlxd/commit/252ebe868b5d8eda1cc702d9e69002522b279578

I will try to find time over the next days to test it on xlx208.

narspt commented 1 year ago

Correct. Btw, I'm not fully sure what you mean exactly when you say it gets broken (just the actual stream or need xlxd restart?), if you have xlxd and ambed on different places I would suggest you also apply this fix https://github.com/LX3JL/xlxd/pull/222 and... if you eventually suspect of any issues on ambed side (as root cause for the timeouts) maybe also consider this one https://github.com/LX3JL/xlxd/pull/226

F4FXL commented 1 year ago

What I meant with "broken" is that once a transcoder timeout occured, xlxd is in some sort of half crashed state. The most obvious symptom is DCS clients no longer communicating together.

LX1IQ commented 1 year ago

Hi Geoffrey,

The XLX is not "brocken" simply restart the ambed service or server and all is working back as expected. A restart of the XLX is not necessary.

73, Luc

F4FXL commented 1 year ago

Hi Luc,

I've run into this issue countless times, restarted ambed but this is not enough. xlxd is still running but DCS clients are unable to communicate i.e. when 2 repeaters are connected through DCS a station transmitting on repeater A will not be forwarded to repeater B. I think I'll open a separate issue for that.

73,Geoffrey


Geoffrey Merck - F4FXL / KC3FRA http://www.f4fxl.org http://www.f5kav.fr

Le dim. 12 févr. 2023 à 08:32, LX1IQ @.***> a écrit :

Hi Geoffrey,

The XLX is not "brocken" simply restart the ambed service or server and all is working back as expected. A restart of the XLX is not necessary.

73, Luc

— Reply to this email directly, view it on GitHub https://github.com/LX3JL/xlxd/pull/213#issuecomment-1426962530, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFHXTQ7FQ2IN36QLSVCHTWXCGYRANCNFSM5NV5TUDQ . You are receiving this because you commented.Message ID: @.***>

narspt commented 1 year ago

If xlxd gets in that half crashed state until restart then I bet you need just this fix: https://github.com/LX3JL/xlxd/pull/222

F4FXL commented 1 year ago

Applied a couple of your patches on XLX208, including that one.

narspt commented 1 year ago

@F4FXL: I'm curious if it did help with your problem? Btw, some of my patchs (incl. this one) are already merged on master.

F4FXL commented 1 year ago

It helped in some extent. Yet from time to time I still have the DCS to DCS issue.

narspt commented 1 year ago

No idea what is your problem then, never experienced it, sorry :(