DJ2LS / FreeDATA

A free, open-source, multi-platform application for sending files and messages, using the codec2 HF modems
https://wiki.freedata.app/
GNU General Public License v3.0
141 stars 17 forks source link

Erasure coding for short message broadcast #418

Open g0hww opened 1 year ago

g0hww commented 1 year ago

Inspired the approach use within the Nack Oriented Reliable Multicast (NORM) protocol, I would like to propose the adoption of an erasure coding strategy, applied to a sequence of codec2 frames and operating at the frame layer, with erasures applying to entire missing frames.. This will use a shortened Reed-Solomon block code in GF8. I will attach proof-of-concept python code to show what I mean, in a series of follow up posts.

Assumptions

Use codec2 DATAC4 with 56 bytes of payload per frame.

Allow FreeDATA TNC 22 bytes to identify this message, based on

This is the BlockHeader BH.

Assume that the sender commits to storing each block for some duration, perhaps 24 hours, or until the Message Number wraps.

The Erasure Coding Layer needs to identify, within each Erasure Coded Segment,

This is the EC Header EH. It is 2 bytes in size.

So, 56 -22 -2 = 32. The ECDU is divided into five 32 byte segments.

We will assume that, by default, we are going to send a maximum number of 5 frames of DATAC4. Of that, up to 4 frames will be client service data segments and 1 frame will contain a proactive parity data segment. We will generate more parity data than this, so we could increase the number of parity blocks we send proactively. This should take 55.17 = 25.850 seconds of air time and should be able to convey client service traffic of up to 128 bytes. Let us choose to generate 4 blocks of parity data, 432=128 bytes.

If we were to send the whole thing in one go, we would have a sequence like

but we plan only to send ( at most)

initially, retaining P6,P7,P8 for reactive EC responses. These could be transmitted in response to EC Repair Requests, issued by receivers that want to fix errors in received data. It is preferable to respond with parity segments, as any station receiving an additional parity segment will be able to repair any missing data segment with it.

g0hww commented 1 year ago

In this first POC, I get the basic stuff working, for a disappointingly trivial example, involving a short message, such that only a single data segment and a single parity segment are sent, only one of which needs to be received to decode the message.

https://gist.github.com/g0hww/9bf247e51ae68ccf5f131c94db07a3ab

This does not demonstrate the utility of the reactive parity approach, which will be done next.

g0hww commented 1 year ago

A bit about Reactive Parity

We have been chatting a bit about the way reactive parity might work.

Firstly, to be clear, the primary purpose of sending reactive parity segments is to fill as many block holes as possible in as many receivers as possible, with the fewest retransmission. Hopefully, proactive parity in the initial transmission has filled most of the holes already.

From the initial block source's perspective, it wants to respond to repair requests by sending parity segments that have not been previously transmitted. This ensures that any station receiving one can fill any hole in their block. If it has eventually transmitted all prepared parity segments, then it should respond specifically with the segment identified in the repair request. This means that the source should begin sending reactive parity backwards, from the last parity segment to the first, and that repair requests should identify the highest segment of the block that they have not yet received.

For the initially proposed idea of having up to 4 client service data unit (CSDU) segments with 4 parity segments, and a maximum client service data unit of 128 bytes (controversial, I know) , this means that we would only ever need to send parity segments in response to repair requests, as having only all the parity blocks completely replicates the maximum sized CSDU, and this gives most bang for the buck. Trading more CSDU size for less parity takes away some of this bang. We can look at use of longer blocks later, there is some room in the proposed EC segment header to do this.

g0hww commented 1 year ago

A bit about Repair Requests.

The best thing about repair requests is that you might not have to send them, if somebody else does it first and the response fills a hole in your block. So the strategy is to wait and hope that some other station sends a repair request. the source responds with a previously unseen parity segment and you have fewer holes, maybe none, with no effort, or energy expended.

An important aspect of this is that it facilitates running a silent receiver. Maybe you have battery power constraints and don't want to expend energy on transmitting repair requests, or have other reasons for not wanting to transmit. Could think more about how to handle switching back to active mode from silent. Maybe a manual option to send repair requests.

We need a way to minimise a pile up of repair requests after an initial transmission by the source, and so the proposed approach is Best Forward Path First. In BFPF, stations who need to send a repair request delay the transmission of that request by an interval based on the estimated SNR of the block source. E.g. X + (MAX_SNR - snr) seconds of delay where X is a constant backoff, starting after reception of the last block segment. With this, we assume that the forward path SNR is a good approximation for the reverse path SNR, of the station sending the repair request, and therefore gives the best chance of getting a repair response from the source.

Repair requests are sent to the Broadcast ID. Basic behaviour is that they solicit a response from the block source, who transmits that response according to the policy previously identified.

The repair requests may be received by other block receivers, who may also have repair requests pending on backoff timers. These receivers suppress their pending repair requests, hoping that the repair request they just heard gets a response to fill in their hole, delaying them by another backoff round. If these receivers actually get to send their well delayed repair requests, they can reduce their backoff interval such that if they need to send another, it will go out sooner.

All backoff timers in block receivers get delayed again if a repair response segment is received, unless it is not needed.

All this is done on a per block basis, until it is completely received and delivered to the client service, or some circumstances cause efforts to expire. One possible outcome is that the source exhausts its willingness to continue supporting that block. Format needs to define error responses.

Next topic: Secondary repair requests to the wider group.