CCSDS decoders: Data never gets processed

errikosmes commented 3 years ago

Hi,

I’ve recently started working on a project trying to replace some satellite baseband equipment with a software radio. Firstly I wanted to say that this project, the articles and the GRCon talks from Daniel have been truly invaluable.

I’m creating this issue because I’m observing some pretty weird behavior when trying to use some of the CCSDS blocks from gr-satellites to decode the signal. It feels like I’m observing some data loss inside the flowgraph, which seems pretty strange to me.

The problem appears when I try to run multiple instances of the CCSDS blocks in parallel to try and account for different types of ambiguities (BPSK bit inversion and two bits ambiguity at the input of the Viterbi decoder). So for me, if I could run 4 decoding branches in parallel to take a very possibility into account, I should be able to decode the most amount of data. However, when running the 4 parallel branches together the flow graph manages to decode hardly any frames at all (my CPU hovers around 40% while decoding). I’m attaching a screenshot of my flowgraph for reference.

CCSDS_decoding_4way

I find that extremely strange as when I run a single branch on its own (the first one) the decoding works (I identified that for the recording that I’m using I need a delay of 1 to synchronize the bit pairs). So when I only run the first one I get frames but when I run the four of them together even the one that was working breaks? What gives? Can you see anything obvious that I’m doing wrong? It feels like the data gets lost or it's not processed even though my CPU is not maxed out. I would love to send a recording to let you replicate the problem but I’m not really allowed to share the data.

I’m also bringing this up because as far as I understand a similar technique is used in the gr-satellites “CCSDS Concatenated Deframer” block, where the decoding procedure follows two branches, one with a delay of 1 and one with no delay, to handle the two bits ambiguity (the decoder needs to bits from the same pair) at the input of the FEC Decoder. So if there is a bug somewhere this block could also be affected.

I’ve observed that changing the “Frame Bits” length at the CC Decoder Definition produces different amounts of data at the outputs, but still far from what I would consider to be enough (a few MB as when I decode with a single branch). I’ve tried values from 32 up to several millions. A value close to 10M seemed to produce the most data, although increasing the buffer size didn’t always mean that I got more decoded frames. Maybe there is some data alignment problem? But that would be pretty weird and it doesn’t explain why it works decently with a single branch and not with multiple ones.

Changing the FEC Extended Decoder Threading Type from “capillary” to “none“ (I don’t really know what that means as these gr blocks have no documentation) also produce different amounts of binary data at the outputs (but still at the order of kilobytes)

I’m not sure if this is a gr-satellites problem, a GNU Radio one, or if I’m doing something wrong. If you have any theories about what could be happening or if you have any debugging ideas please let me know!

Thanks!

errikosmes commented 3 years ago

Forgot to mention I'm using GNU Radio 3.9.1.0 and gr-satellites 4.1.0 both built from source

daniestevez commented 3 years ago

Hi Errikos,

I can't see anything that strikes me as wrong about your flowgraph. I think it should work. Regarding CPU usage, you say that this takes 40% of the CPU, but what about the usage per core or per thread? Is there a particular thread that maxes out a CPU core at 100%?

Also, where does your data come from? If you're trying to process samples from an SDR receiver in real time, then definitely you need your CPU to be able to keep up with the data, for otherwise you'll get lost samples and all sorts of problems. If your data is a recording of some sort, then real time processing is not necessary. GNU Radio will go through the data as fast as your CPU allows (unless you've place throttle blocks), and the results should be correct regardless of how fast your CPU is.

You can try to decompose your problem by recording the symbols to a file on a first pass, and then working with that file to debug your problems. Bugs aside, you should get repetitive and correct results when processing such file.

You're right that your flowgraph is quite similar to the "CCSDS Concatenated Deframer" block. The main difference is that you handle a 180º phase ambiguity in your flowgraph, while the CCSDS Concatenated Deframer block doesn't (it seems that none of my use cases was BPSK without differential encoding). Adding the 180º phase ambiguity handling to this block should be relatively easy.

A word about optimization: the usual CCSDS convolutional code has an odd number of taps in both branches. The consequence is that if you put inverted data in, you get inverted data out. The same happens for the Viterbi decoder. This means that you don't need to handle the 180º phase ambiguity at the Viterbi decoder level. You can do with just two branches (one for zero delay, the other for a delay of one sample), and then at the output of each of the two Viterbi decoders you send into one Sync and Create PDU block without modifications and into another Sync and Create PDU block by passing through a Not block to invert the data.

How the Viterbi decoder in streaming mode works regarding the frame size and threading is tricky and I always need to study the source code carefully when I have questions about it. I think that it should produce the same output (and the output should be correct) regardless of how these parameters are set. The difference should be only in how the calculations are organized, and perhaps in the performance.

I hope this helps.

daniestevez commented 3 years ago

Just as I sent my comment I've seen a potential problem: you're using the same CC Decoder Definition in all the Viterbi decoders. This has given me problems in the past. I don't know if this is intended behaviour or if it's a bug, but I recommend you try to use a different CC Decoder Definition for each Viterbi decoder. They should all have the same parameters, so you can copy and paste the block three times.

If you looked at the Python code of the CCSDS Concatenated Deframer block it's not obvious that it uses a different CC Decoder Definition for each Viterbi decoder, because both are wrapped into the ccsds_decoder hierarchical block. However, if you look for example at the K2SAT Deframer, you'll see that I have two identical CC Decoder Definition objects precisely because of this.

errikosmes commented 3 years ago

Hi Daniel,

Thanks a lot for taking the time to take a look at my problem.

Unfortunately no, at least as far as htop reports no thread ever goes near 100% (I think the maximum I’ve seen is about 60%)

For the final design, yes, the goal is to process the data in real time. I’m just working with a IQ recording at this stage just to make development easier and I’ve placed a throttle block to run the flowgraph at my desired sample rate. I’m aware that having such a 4 way processing path might not be really efficient. I just wanted to give it a go to see if it works and to help verify that if the demodulating part of my flowgraph works correctly. In the future, I might try to develop my own block that handles the ambiguity problem more efficiently if I find a more elegant idea to rectify it (I just made my first C++ block today, so I’ll see how this goes).

I’ll make sure to keep an eye on any data loss problems if I ever test this method directly with an SDR, thanks for pointing that out!

Thanks for the optimization idea! Doing the bit inversion after the Viterbi encoder with a xor block should be more efficient. (Would a not block work? It doesn’t matter if the lsb is not the only bit that’s inverted?)

At this point I think I should try using different CC Decoder Definition blocks. If that’s the problem it would explain quite a lot. I never thought of that. Thanks for the idea!

errikosmes commented 3 years ago

That totally worked! It was the CC Decoder Definition block.

Thank you so much!

I'm closing the issue.

daniestevez commented 3 years ago

Good to see that worked!

Regarding your question

Would a not block work? It doesn’t matter if the lsb is not the only bit that’s inverted?

I'm pretty confident that you can use a "Not" block to invert a stream of unpacked bits. With unpacked bits only the value of the LSB matters.

errikosmes commented 3 years ago

OK that makes sense. I'll use a not block.

Thanks again

daniestevez / gr-satellites

CCSDS decoders: Data never gets processed #264