Closed Aang23 closed 1 year ago
Hi, sorry for the huge delay in replying.
Can you share a more detailed example of what would be needed as inputs and outputs? My understanding is that DVB-S2 uses some kind of repeat-accumulate codes that lend themselves to compact representations for the encoder. See the formulas in Section 5.3.2 in EN 302 307-1. From these formulas you could easily get an ALIST, which is what ldpc-toolbox currently does.
A general ALIST won't correspond to one of these repeat-accumulate codes, and won't lend itself to such a compact encoder representation. Even if the code is actually repeat-accumulate, extracting the repeat-accumulate representation from the ALIST could be a hard problem (computationally), I imagine.
So I'm not even sure I've understood the statement of the problem. Maybe I'm confused by what you mean by "DVB-S2 style matrices".
I'm not promising that I'll implement something like this, but I would like to capture exactly what would be needed, just in case me or someone else finds the time and motivation to do it.
Hi, looks like I also missed your reply.
So, first of all I was too vague in my initial comment, sorry about that!
I am attempting to get the CCSDS 131 codes running in this LDPC decoder implementation https://github.com/xdsopl/LDPC. I am already using it for DVB-S2 decoding, and it can achieve very high throughput on a single core easily (even at lower SNR), so for example I can live-demodulate a 10MBaud 32-APSK 9/10 signal without any issues.
While I already have implementations working for CCSDS codes (either AFF3CT or https://gitlab.com/librespacefoundation/gr-ccsds), both of them are horribly slow in comparison, even making heavy use of SIMD and multithreading, all that while adding a rather annoying dependency load (boost / aff3ct's machinery).
Hence, I have for now quite a while been attempting to get those codes working. And as obviously the CCSDS circulants provided in the specification do not match the DVB-S2's representations, this has proven rather difficult. I've concluded converting them to the same compact representation is the best approach. I did find some code (linked above) to convert from a PCM matrix to such representation, but I haven't had much success in having it complete in any reasonable amount of time. It's likely better to rewrite it in a faster language first.
Then, the reason I kept the initial issue more generic is that a way to convert even custom codes to something @xdsopl's decoder could work with would be extremely handy for implementations that require fast LDPC decoding.
Hi, I think this issue is now more relevant, given the work I've doing lately with Artemis I.
Does gr-ccsds support LDPC? I've taken a look at the code they have, and I haven't seen it.
Before choosing to use AFF3CT, I briefly considered the possibility of using Ahmet's LDPC decoder. However, it was not immediately clear to me how to adapt it for working with the CCSDS AR4JA.
I think we've both probably seen how the DVB-S2 codes are defined in Ahmet's decoder. As I told you before, the construction of the DVB-S2 codes and the AR4JA is very different. If I'm not mistaken (I was thinking about this hastily in the morning), the structure of the DVB-S2 parity check matrices is the following: The matrix $I{n-k}$ is the $n-k \times n-k$ identity matrix, the matrix $U{n-k}$ has ones in the diagonal below the main diagonal, and the matrices $A_{j, 360\times k}$ are $360 \times k$ matrices where each row is a circular shift of the previous row by a certain step $q$ (so pretty much like a circulant matrix, though the step $q$ is not one).
Note that in particular the structure of this parity check matrix gives a relatively sparse construction for a generator matrix that uses the first $k$ coordinates as systematic.
You can see the structure of CCSDS AR4JA parity check matrices in the blue and green books. It has nothing to do with this. In particular, the generator matrix using the first $k$ coordinates as systematic is dense.
The DVB-S2 matrices can be described by the positions where ones appear in the first row of each matrix $A_j$. This is how Ahmet's decoder represents them. However, I don't know whether the belief propagation algorithm in that code exploits this structure (in which case it wouldn't be trivial to adapt it to the CCSDS codes) or whether the parity check matrix is constructed at some point and used by the belief propagation (in which case I guess we could feed our own parity check matrix at some point downstream). Maybe @xdsopl can clarify this.
the only project capable of this is https://github.com/Lab-STICC-UBS/DVB-S2-matrices
I've taken a look at this. It doesn't have much documentation, so it's hard to tell what it does exactly. But I think that what it does is to find the "compact representation" given a DVB-S2 parity check matrix. I don't think that it will work with any other kind of parity check matrix. That the code is slow is not a problem. You only have to do this transformation once.
and it can achieve very high throughput on a single core easily (even at lower SNR), so for example I can live-demodulate a 10MBaud 32-APSK 9/10 signal without any issues.
I'm surprised by this. I'm not going to say whether AFF3CT or Ahmet's implementation is faster, because I haven't done enough benchmarks. However, AFF3CT claims to be pretty fast, so I would be surprised if Ahmet's code is orders of magnitude faster (and would want to learn the tricks). Is this a fair comparison (i.e., are you using the same arithmetic data types, number of iterations, etc.,, with both libraries)?
Here I have some results about simulating the 8/9 DVB-S2 LDPC codes with AFF3CT. The throughput was about 4 Mbps (highly dependent on the Eb/N0). I think this is with a Ryzen 7 5800X using all the cores (the only thing I'm not certain about is whether the throughput that AFF3CT measures is normalized per thread or the overall sum of all threads). I'm surprised that you're able to do 10 Mbaud 32-APSK 9/10 (which is 45 Mbps) in real time using a single core.
Does gr-ccsds support LDPC? I've taken a look at the code they have, and I haven't seen it. Yes, it does. It is on the dev branch. The decoders work (I've tested 2 of them on actual downlinks), but the performance and reliance on boost::ublas makes it a no-go to use in SatDump currently. I've been considering porting it to a simpler matrice implementation so that I could at least have a LDPC decoder published.
As for Ahmet's LDPC, it does exploit the specific structure of the DVB-S2 (and T2/S2X) LDPC codes. The full parity check does not appear to be generated. I guess that will make it unusable for other codes without major modifications.
I've taken a look at this. It doesn't have much documentation, so it's hard to tell what it does exactly. But I think that what it does is to find the "compact representation" given a DVB-S2 parity check matrix. I don't think that it will work with any other kind of parity check matrix. That the code is slow is not a problem. You only have to do this transformation once.
I've ended up being able to use it. But it, as expected, does not work on any of CCSDS codes unfortunately. I was hopeful at least about the C2 r=7/8 code (which is actually a higher priority for me than AR4JA codes at the moment) but that did not work either.
I'm surprised by this. I'm not going to say whether AFF3CT or Ahmet's implementation is faster, because I haven't done enough benchmarks. However, AFF3CT claims to be pretty fast, so I would be surprised if Ahmet's code is orders of magnitude faster (and would want to learn the tricks). Is this a fair comparison (i.e., are you using the same arithmetic data types, number of iterations, etc.,, with both libraries)?
It wasn't a fair comparison in that regard. I wasn't comparing the throughput of each algorithm 1:1 but what I could achieve within my requirements. So, Ahmet's was using 16-way SSE with int8_t, while I used the smallest aff3ct would let me, so floats. Number of iterations and such were the same otherwise. With my current DVB-S2 implementation I did not notice any major increase in soft decoding performance with higher bit depth, already being very close to the standard's theoretical limits. As for 1:1 performance, I'm honestly not sure. I'd assume Ahmet's will be a bit faster than aff3ct due to exploiting the code's structures.
I was also quite surprised by the performance I could get with Ahmet's (@ 8-bits). On older hardware (I believe a 2nd Gen i5, 4 threads), it was possible to do live demodulation of GOES-R GRB (~9M QPSK 9/10) live despite the DSP taking up most resources. The constraints ended up being BCH decoding, and more on the rest of the DSP (such as the SOF correlator I would guess, but I haven't tried to optimize further as of now). LDPC itself was using much less than a single core in that scenario.
Perhaps if AFF3CT was able to use integer types instead like Ahmet's the performance would be more acceptable. I've also been looking into other experimental projects for some ideas. Some of them, such as https://github.com/stippeng/ldpc do look pretty promising.
So, after looking around for a few more projects I found this one : https://github.com/blegal/Fast_LDPC_decoder_for_x86
It took quite a bit of work to get it to a state It'd even build without relying on the Intel compiler, but I managed that and the results are pretty promising. All on single core using SIMD, at 30 iterations and 8-bits LLRs (On a Ryzen 7 1700X) : ~62Mbps on the DVB-S2 1/2 code ~80Mbps with the 1024 1/2 CCSDS AR4JA code ~52Mbps with the CCSDS C2 7/8 code ~56Mbps with the DVB-S2 9/10 code
The formats utilized are full parity check matrices, so converting an alist seems very feasible. However quite some work will be required to turn it into an usable library, but I think I'm going to be working on this.
One downside I've noticed is : The code is not set at runtime. Everything needs to be compiled with the desired code defined in a header before compilation. That's definitely part of the reason it is fast, but means turning it into a library will require some more complex build-time machinery to account for this.
Yes, it does. It is on the dev branch.
The dev branch and the master branch contain the same (both point to the same commit since ~3 years ago). I still don't see where this LDPC decoder code is. Can you share a direct link to the implementation?
Perhaps if AFF3CT was able to use integer types
Are you sure that it isn't possible? Looking at the LDPC BP flooding decoder (and other decoders are similar) I see that it's implemented as a template where you can freely specify the type of the soft symbols. I don't know to which extent then it exploits this by doing many symbols at once with SIMD if you use 8-bit symbols. This might be the limitation
This looks good. Please share your code at some point.
The dev branch and the master branch contain the same (both point to the same commit since ~3 years ago). I still don't see where this LDPC decoder code is. Can you share a direct link to the implementation?
Ah, sorry. It's in a merge request : https://gitlab.com/she-reds/gr-ccsds/-/tree/ldpc_tm_8176_7154/lib
Are you sure that it isn't possible? Looking at the LDPC BP flooding decoder (and other decoders are similar) I see that it's implemented as a template where you can freely specify the type of the soft symbols. I don't know to which extent then it exploits this by doing many symbols at once with SIMD if you use 8-bit symbols. This might be the limitation
I've attempted to switch Aff3ct's LDPCs to fixed-points, which resulted in a bunch of un-implemented functions variants and other C++ template errors. As it's not available in int8_t (or others) mode by default either (unlikely other decoders), I've concluded this is currently not supported.
This looks good. Please share your code at some point.
I sure will. The reason I'm working on this is to have it utilized in SatDump at some point for upcoming (and current) satellite utilizing LDPC, and chances are I will publish whatever I end up with as a standalone library.
Edit : Also, lowering trials to something like 10 easily achieves 200Mbps+! That's pretty great. Varies little if not at all depending on the SNR.
@daniestevez I worked on this a bit today. If you want to take a look : https://github.com/Aang23/LDPC_Tests (currently only added the CCSDS LDPC 7/8 code, but you can easily copy-paste other codes from https://github.com/blegal/Fast_LDPC_decoder_for_x86/tree/master/src/Constantes/*/constantes_sse.h). I have tested it against real data from a mission utilizing the standard successfully.
Editing the decoder to take dynamically-defined codes from a struct had a minor impact on performance, so I'll go that route as having some more generic will be more practical. The decoder takes H matrices (variable nodes positions) in the following format : https://github.com/Aang23/LDPC_Tests/blob/master/src/ldpc_code.cpp#L23.
I will need to look into loading / converting a generic Alist, should definitely be possible. Though as my knowledge of LDPC codes is not too great at the moment this may take a while.
As this issue is not really relevant, should I close it? Discussing work on a decoder is pretty far from the original intent.
Some LDPC decoder implementations, such as @xdsopl's (one of the fastest I could find, especially without relying on a load of dependencies) require DVB-S2 style matrices as an input. That makes it really tricky to use with self-designed codes or anything from a different standard.
Hence, a feature to convert from a generated ALIST directly to such a table would come in extremely handy. Currently (though I haven't dug into implementing this myself, yet), the only project capable of this is https://github.com/Lab-STICC-UBS/DVB-S2-matrices, but being written in Matlab/Octave, it is incredibly slow to the point I have not yet confirmed it is actually functional.