CFSworks / wavebird-reversing

Reverse-engineering the WaveBird protocol for the betterment of mankind
212 stars 11 forks source link

FEC code: BCH with p(x)=x^5+x^2+1, t=2 #2

Open cbrauers opened 4 years ago

cbrauers commented 4 years ago

When I read that the FEC block code was a cyclic (31,21) code, I could not help but check the BCH code tables. n=2^(q-1), k=n-t*q just screams BCH with q=5. Try to factor out a primitive polynomial, and see if that thing really is a BCH code. And lo and behold, it is.

Then I figured I could have saved myself the effort by looking up the reference you posted on the POCSAG protocol. It does not say that it is BCH, but it does display the polynomial factored into irreducible polynomials, so there is that.

Anyway, all this to say that there is absolutely nothing POCSAG-specific about that code. It is your bog-standard binary BCH code, using the most common primitive polynomial that is used for a GF(2^5). And, as you outlined in your article, it differs in significant ways from the BCH code used in POCSAG - being both non-systematic and lacking that additional parity bit.

Just wanted to give you a heads up, since that also means that the algorithm to correct errors should be in reach. The BCH decoding algorithm is typically taught for systematic codes, but since there is nothing in the decoding algorithm about how codeword bits are mapped to payload bits, it should be trivial to adapt to the non-systematic code, especially considering that you had already figured out that you can get a sensible payload if you assume the naive method of non-systematic cyclic encoding.

CFSworks commented 4 years ago

I'll definitely have to amend that chapter accordingly. I must admit that while I considered whether this could be a BCH code, the Wikipedia article actually doesn't explain how to encode BCH and I couldn't confirm for sure that the definition of BCH matched the algorithm I was seeing. The article on POCSAG used the same polynomial and called it merely a cyclic code (31, 21) (as you pointed out), so I played it safe and called the WaveBird's code "cyclic" rather than the more precise "BCH." I should probably also make it clearer that I mentioned POCSAG only because it uses the same polynomial, so as not to imply that the code is somehow POCSAG-specific.

Either way, the underlying Wikipedia articles that I used for my background research should really be shored up too, since the missing information there was the root cause of this whole confusion. (And this was also my first encounter with FEC in general - in a way I'm glad the WaveBird code was non-systematic or I'd have been tempted to ignore the FEC!) I'd make the edits right now if not for the fact that I come from a computer engineering background and would probably explain the encoding process as "just CLMUL the polynomial against the message, no big deal" which I suspect isn't "mathematical" enough to fit with the rest of the BCH article. :)

Also: I was under the impression that POCSAG was also non-systematic - is it actually systematic after all? I guess there's no reason BCH can't be systematic, but I was under the impression that the "pure" BCH encoding process was non-systematic (i.e. WaveBird-style) after reading this section in the article on Reed-Solomon. If I do work up the courage to fill in the Wikipedia article on BCH encoding, I'll have to make that distinction clearer as well (i.e. provide an algorithm for both the systematic and non-systematic encodings).

P.S. Could you also share the BCH code table that you used as reference? That might be something I'd like to link to as well.

CFSworks commented 4 years ago

Okay, I decided to suck it up and fix Wikipedia.

Any obvious overgeneralizations or (worse) mistakes/misinformation in my edits? Figured I'd ask as you seem much more intimately/intuitively familiar with the math behind these things than I.

cbrauers commented 4 years ago

In this case, the table I used was from Wolfram MathWorld, but similar tables are found in many a book on encoding. Once you have the primitive polynomial, you can build the rest of the BCH code, i.e. any generator polynomial for a given error correcting capability, (and gleam its properties even before doing so) from what is known as the cyclotomic cosets. Which is probably a bit involved for the discussions in this guide, since it involves arithmetic in the GF(2^5) in this case, not the simple GF(2).

The line that pointed me to the POCSAG code being systematic is "Each codeword carries 21 bits of information (bits 31 through 11), 10 bits of error-correcting code (bits 10 through 1), and an even parity bit (bit 0)." - for a non-systematic code, it would not be possible to divide the codeword into payload information and error-correcting information. I would say the systematic encoding is more common if you are building a standard, as there is next to no implementation overhead for a hardware transmitter and it makes it somewhat easier to build a receiver, especially if you half-ass it and do not actually correct any errors. In proprietary systems, who knows.

The Wikipedia additions look quite good (though there might be a wee case of [citation needed]). One point of notice is the degree of the polynomials - The remainder should be one degree less than the divisor - otherwise, the quotient's x^0 term got skipped. And likewise, if the parity polynomial includes an x^(n-k) term, that would interfere with the x^0 term of the payload (which should be the x^(n-k) term of the codeword). The remainder is of degree n-k-1, which gives it n-k coefficients.

Ignoring the FEC would probably have been difficult if you want to emulate a controller. Since there are more FEC bits than corrigible errors (there have to be, for obvious reasons), the result would have been either the FEC decoder rejecting your message or miscorrecting it to something else and thus breaking the CRC - again rejecting your message. Speaking of CRC, why the hell would the designers of this protocol put it outside the FEC? The error correction is supposed to make the transmission more robust, and then they go ahead and put some vital bits in the non-robust part.

But great work on that guide - really walks you through a lot of digital communications with a practical example. There are a few potentially interesting points left out in the modulation chapter, such as phase modulation and baseband filtering, but that is no topic for this issue.

CFSworks commented 4 years ago

I would say the systematic encoding is more common if you are building a standard, as there is next to no implementation overhead for a hardware transmitter and it makes it somewhat easier to build a receiver, especially if you half-ass it and do not actually correct any errors.

Mm! I also find them tons more eyeball-friendly when in the debugging stage and staring at a logic analyzer. Glad to hear that they're common then.

The Wikipedia additions look quite good (though there might be a wee case of [citation needed]). One point of notice is the degree of the polynomials - The remainder should be one degree less than the divisor - otherwise, the quotient's x^0 term got skipped. And likewise, if the parity polynomial includes an x^(n-k) term, that would interfere with the x^0 term of the payload (which should be the x^(n-k) term of the codeword). The remainder is of degree n-k-1, which gives it n-k coefficients.

Accursed off-by-one errors! They follow me even if I'm not doing programming work.

I fixed it by phrasing it instead as As r(x) is always of degree less than n-k (which is the degree of g(x)), ... Good catch, thanks!

Ignoring the FEC would probably have been difficult if you want to emulate a controller.

My original interest was only to decode the message so I could use my WaveBird as a generic gamepad. I probably would have done something like a "best 3 out of 5" majority to achieve error tolerance (at the expense of latency) and ignored the CRC/FEC completely. Had I wanted to emulate a controller, I might have ended up building a lookup table of FEC+CRC codewords for each bit (essentially using a generator matrix and treating the FEC+CRC bits as a single combined linear code) without truly understanding what was going on.

Speaking of CRC, why the hell would the designers of this protocol put it outside the FEC? The error correction is supposed to make the transmission more robust, and then they go ahead and put some vital bits in the non-robust part.

I've wondered "what the FEC" is up with that myself. :) On reflection, I suspect that the CRC was probably a late addition as a hack to fix some occasional errors the Nintendo engineers were seeing and meet their project deadlines in time, rather than something carefully thought out and discussed.

But why wasn't the CRC on the inside of the code? My first thought was "oh, maybe they ran out of space in the FEC-protected bits" but they have that static 16-bit 0BD5 prefix in there, which would be an obvious choice for removing to make room for a CRC-16. The 0BD5 was probably put there first - before they had CRC - as a check against overzealous correction by the BCH decoder in cases of 3+ errors per block, but it didn't actually work because their utterly nonsensical interleaving scheme broke the payload apart into 4 contiguous blocks rather than alternating/striping the bits across code blocks, so it ended up putting all 16 bits of that check in the same 21-bit block, leaving the other 3 independent BCH blocks totally unchecked against 3-bit-error. They probably didn't pay enough attention to their interleaving to realize the problems this would have: cue occasional (and infuriating) misreads on all axes, as well as on the L/R/Z/D-Pad buttons, in the product testing phase.

I'm betting the engineer who added the CRC had a pretty strong mistrust of BCH at that point (not realizing that the real problem lay in the interleaving), and feared (irrationally) that BCH might even misbehave and also "fix" the CRC. So perhaps this fellow felt more comfortable putting the CRC on the outside, out of BCH's reach.

Note they also have that 1234 "sync word," which causes the whole packet to be rejected if a single bit is flipped there, so they didn't seem too worried about making WaveBird work against any arbitrary bit flip anyway - they only wanted to cover the majority. (Remember that the gamepad retransmits at a rate of 250 Hz but games only sample input at up to 60 Hz, so the receiver gets 4-5 chances to decode a packet before losing a frame's worth of input.)

But yeah - if it were up to me, I'd have taken the 0BD5 out of there and put the CRC in its place instead. I'd have also done the interleaving properly, and would've accepted sync words that were Hamming distance 1 to 1234, which would have resulted in a protocol tolerant of any single-bit error at all.

There are a few potentially interesting points left out in the modulation chapter, such as phase modulation and baseband filtering, but that is no topic for this issue.

Phase modulation is really cool, and is definitely good to learn about (since it serves as a stepping stone into QAM) but I couldn't justify including it when the purpose of my guide is to give people a taste of what this kind of (reverse) engineering is like. A pity because PSK (or, rather, DSSS) is my personal favorite.

The baseband filtering is something I may need to illustrate better, because that's a more vital skill, being important to ensure that no aliasing occurs when the baseband is sampled. I did briefly mention it in my hackrf_transfer command in that chapter - do you think that's too much brevity? Maybe I should include a note about that in my explanation on SDR units.

cbrauers commented 4 years ago

Mm! I also find them tons more eyeball-friendly when in the debugging stage and staring at a logic analyzer. Glad to hear that they're common then.

They are less common in convolutional codes, where systematic encoding unfortunately affects the reliability of non-recursive codes. Since block codes pretty much do not care how messages are mapped to code words, there is practically no drawback to systematic encoding.

My first thought was "oh, maybe they ran out of space in the FEC-protected bits" but they have that static 16-bit 0BD5 prefix in there, which would be an obvious choice for removing to make room for a CRC-16. The 0BD5 was probably put there first - before they had CRC - as a check against overzealous correction by the BCH decoder in cases of 3+ errors per block, but it didn't actually work [...]

They might also be a controller identification or a sort of opcode, just in case they were to design more wireless controllers in the future. Putting them there for error detection is downright useless, since a defective BCH codeword will just result in the decoder placing t extra bit errors into the block at practically random places. So two errors in 21 bits - yes, the chances of getting at least one of them into a 16-bit opcode are not that bad (especially when you factor in the fact that the 3 original errors are also still there), but still, going the extra parity bit route POCSAG went (or using the fact that BCH is not a perfect code (which is a specific term, not a judgment of usefulness), and the miscorrected word is probably not a valid codeword anymore) would be the more canonical way. Actually, I do not know how much a non-systematic encoding would affect the influence an extra bit error in the code word might have on the message, it might actually completely distort those bits if the errors happen to break the first block.

[...] but it didn't actually work because their utterly nonsensical interleaving scheme broke the payload apart into 4 contiguous blocks rather than alternating/striping the bits across code blocks [...]

That is actually a standard block interleaver, the stuff anyone uses if all you exchange is small packets of data (continuous streams tend to use Forney convolutional interleavers, which have slightly lower latency and require half the memory, but instead need run-in and run-out). For most applications, an interleaver before the first FEC encoder is completely unneeded, since it does not matter where in the payload surviving errors end up. Typically, interleavers are between the FEC and the channel, or between two FECs if you use concatenated codes.

Phase modulation is really cool, and is definitely good to learn about (since it serves as a stepping stone into QAM) but I couldn't justify including it when the purpose of my guide is to give people a taste of what this kind of (reverse) engineering is like. A pity because PSK (or, rather, DSSS) is my personal favorite.

Good old BPSK and QPSK. One being an ASK suited for synchronous demodulation, the other the simplest kind of QAM. There are actually modulations that can be decoded as either FSK or PSK, notably MSK. The modulation index (frequency range / symbol or chip rate) seems quite far from that, though. Unless there is something in the spreading sequence (which acts like a form of DSSS) that fixes it, that FSK is nowhere near orthogonal, which has a negative effect on its noise robustness.

The baseband filtering is something I may need to illustrate better, because that's a more vital skill, being important to ensure that no aliasing occurs when the baseband is sampled. I did briefly mention it in my hackrf_transfer command in that chapter - do you think that's too much brevity? Maybe I should include a note about that in my explanation on SDR units.

I am also talking about transmitter-side filtering (pulse shaping). FSKs typically use either square or gaussian pulses, and you usually find that (along with the bandwidth, as a multiple of the symbol rate) as part of a spec sheet for a protocol. Just something that could be added to turn this into a rather comprehensive intro to digital communications. Of course, there are lots of topics that a course on digital communications might cover that go far beyond the scope here; including stuff like OFDM (which is super common, but just not relevant for this) or GF(2^q) arithmetic, which might actually be useful in this context (BCH error correction), but are just next to impossible to present in popular science form.

CFSworks commented 4 years ago

They might also be a controller identification or a sort of opcode, just in case they were to design more wireless controllers in the future.

Entirely possible and I hadn't thought of that. It's not likely that it's an opcode since the "commands" are sent from the GameCube to the controller, and the controller responds with a "naked" status message. An identification might be useful if they wanted to have games identify the appearance/color of the controller, though any changes to the protocol would require a different receiver altogether and that wouldn't benefit them. A "check magic number" is still my prime hypothesis, given the "arbitrariness" of that number, and the fact that there's 8 bits left over which they just used for padding, which could do double-duty as an identifier if they really wanted in the future.

Putting them there for error detection is downright useless, since a defective BCH codeword will just result in the decoder placing t extra bit errors into the block at practically random places.

While I agree that what I've described is a poor design, a note of caution: in reverse-engineering (and indeed any investigative pursuit), the "uselessness" of a hypothesis bears little weight on its validity, since people do useless stuff all the time. (Case in point: putting the CRC16 on the outside of the code)

I still maintain that (what I am guessing is) their first-draft design would work - albeit not as well as putting a CRC16 in there instead of a static 16-bit "magic number" - if it were implemented correctly:

Actually, I do not know how much a non-systematic encoding would affect the influence an extra bit error in the code word might have on the message, it might actually completely distort those bits if the errors happen to break the first block.

Correct - their non-systematic decoding is polynomial division over GF(2), so a bit error would manifest in the low bits as something like x^bad_bit (mod g(x)) - or put another way, imagine setting an LFSR to the generator polynomial and setting its state to 1 at the point the bit error(s) occur. The LFSR output would give the bit error mask that results after division. In either case, the error gets manipulated quite a bit by the polynomial and swept "downstream" (or "upstream" depending on whether they start at the "left side" or "right side") where the 0BD5 resides.

For most applications, an interleaver before the first FEC encoder is completely unneeded, since it does not matter where in the payload surviving errors end up.

My apologies. When I was talking about the "utterly nonsensical interleaving scheme" I was referring to such an interleaver before BCH encoding, which I'm nearly certain was intended. That interleaver runs on a 4<->21(!) interleaving scheme, which are dimensions they'd choose if they were intending an interleaving run before the BCH encoding. For some reason though, they only do that interleaving before the CRC16(??? more reason to suspect the CRC16 was just "tacked on" I guess) and they go back to the non-interleaved version before BCH encoding. Maybe due to a typo in the microcontroller code, where they pass the wrong buffer into their encoder? Or maybe there was some miscommunication between the guy writing the BCH encoder and the guy packing the message: they didn't know who had the responsibility to do the interleaving, so they ended up undoing each other's work. Ultimately, I have no idea.

Had this inner interleaver worked properly, they would have ended up scattering that 0BD5 sequence evenly across the 4 21-bit message blocks, which means 4 bits of static check bits per (non-systematic) BCH block. What you get is something that works about as well as a CRC-4 per block.

It's still not as good as a CRC-16, which gets all of the benefits of the above scheme and also creates a nice interdependency between the blocks to allow them to cross-check each other. Plus, the CRC-16 can run on an independent polynomial - not sure what that does for the overall distance value but it certainly can't hurt. :)

Unless there is something in the spreading sequence (which acts like a form of DSSS) that fixes it, that FSK is nowhere near orthogonal, which has a negative effect on its noise robustness.

Indeed the spreading sequence does fix it! I've even had success by tuning only to the upper frequency and using a ~750KHz low-pass filter -> peak detector to treat the signal as OOK instead. (Of course the particular implementation I prefer is two such peak detectors acting in differential; it gives the nicest SNR of anything I've tried.)

I am also talking about transmitter-side filtering (pulse shaping).

Good point. I don't really want to talk too much about transmitter-side stuff (my belief is a healthy understanding of receiver-side communications will naturally lead to a comfortable understanding of transmitter-side as well). I don't want to get very in-depth on pulse shaping, but I could probably say something like "Low-pass filtering is a really good idea on the DAC side as well, to protect against images (which you can think of as a counterpart to aliases - they're pretty much the same thing if you run the math the other way around) outside of the bandwidth that you strictly need (per our old friend Nyquist) for your sample rate. Your transmitter's power supply and your country's radio regulatory agency will thank you as well."

I realize there's a little more nuance to pulse shaping than that, but the essence is that the Shannon-Nyquist theorem must be respected even on the DAC side or when upsampling, and that's the central idea at the heart of pulse shaping so strong faith in Nyquist should serve a newbie well regardless.


We're creeping in scope a little bit so I'm going to start maintaining a check list for the things I need to make sure I cover in my writing:

cbrauers commented 4 years ago

Entirely possible and I hadn't thought of that. It's not likely that it's an opcode since the "commands" are sent from the GameCube to the controller, and the controller responds with a "naked" status message. An identification might be useful if they wanted to have games identify the appearance/color of the controller, though any changes to the protocol would require a different receiver altogether and that wouldn't benefit them.

Might be a matter of coexistence with other (planned or existing) devices that use a similar protocol, but that seems unlikely with the manual frequency selection. And the fact that the 'cube itself decides the commands is true, but a one-way wireless protocol would have to find another way anyway.

Correct - their non-systematic decoding is polynomial division over GF(2), so a bit error would manifest in the low bits as something like x^bad_bit (mod g(x)) - or put another way, imagine setting an LFSR to the generator polynomial and setting its state to 1 at the point the bit error(s) occur. The LFSR output would give the bit error mask that results after division. In either case, the error gets manipulated quite a bit by the polynomial and swept "downstream" (or "upstream" depending on whether they start at the "left side" or "right side") where the 0BD5 resides.

I had imagined something like a shifted polynomial, but yes, an LFSR output makes more sense. By that point in the classes I took (and TAed for), we were usually only concerned with the remainder and not the quotient anymore, as most block codes (especially the cyclic ones) are systematic.

But that gives off an interesting look at how things that you would not expect to find in a well-designed FEC - error detection with magic numbers, pre-encoder interleaving and non-systematic cyclic codes - might actually work together to create something that does not completely suck. Or at least would not completely suck if they did not bug up the implementation.

Looking at the bit representation, 0BD5 actually looks like a half-decent sync sequence. A long burst of ones, flanked by some alternating 0 and 1 bits. Still seems strange; by the time you decoded your BCH, you should be synchronous already.

Indeed the spreading sequence does fix it! I've even had success by tuning only to the upper frequency and using a ~750KHz low-pass filter -> peak detector to treat the signal as OOK instead. (Of course the particular implementation I prefer is two such peak detectors acting in differential; it gives the nicest SNR of anything I've tried.)

That should also be possible with a non-orthogonal FSK. But I assume that the look you took at the waveform confirms that the spreading cancels out non-orthogonality. Probably simply due to the fact that the shifted correlation is quite a bit lower than the unshifted one, so you do not get the same crosstalk from neighboring chips that you would get from neighboring bits without spreading.

"Low-pass filtering is a really good idea on the DAC side as well, to protect against images (which you can think of as a counterpart to aliases - they're pretty much the same thing if you run the math the other way around) outside of the bandwidth that you strictly need (per our old friend Nyquist) for your sample rate. Your transmitter's power supply and your country's radio regulatory agency will thank you as well."

I do not really think about spectral side lobes as images - that is stuff for sampled analog waveforms to me. The way I see it, random data is white noise until you shape the transmitter pulse to something other than a delta impulse.

Treating receiver-side and transmitter-side filtering in the same place might also be a nice segue into matched filters, the optimum way to prepare signals for sampling after white noise. That can then get us into Nyquist pulses, and then... I fear I am again far exceeding your scope here.

Clarify that the FEC is just a standard binary BCH code, related only coincidentally to POCSAG, and hence mention POCSAG only once or twice.

Sounds like a good solution

Talk about why ADCs and DACs have low-pass filters in front of them.

Always useful to have, but the DAC most of all needs a filter behind it. And the simplest filter, the square pulse, is usually part of its natural aperture.

Add a note to the footer of the modulation chapter emphasizing that my list isn't exhaustive, mentioning other modulation schemes that are known, and reminding the reader once again that virtually any aspect of the signal can carry information (perhaps inspiring them to come up with some modulations of their own)

Seems like the best compromise between mentioning this rather important modulation scheme and not talking about too much stuff irrelevant to the RE task itself.

Allude to OFDM ("You can multiply your throughput if you transmit bits in parallel on different frequencies, as long as the frequencies are carefully spaced so that they don't interfere with each other. Fourier offers some clues on how to space out your frequencies as closely as possible while ensuring that they don't interact.")

I guess you can do that if you want to, but I do not think it offers that much benefit in this context.

A quick note that what programmers call "shift-and-XOR" is what mathematicians call "polynomial multiplication over GF(2)" (I really want to avoid getting too mathematical, experience tells me it's the main thing that intimidates people out of continuing)

Personally, I feel the easiest way to visualize it is just long multiplication of binary numbers without carry. Including the x^ks just makes it a needless writing effort and scares off more practically-oriented people.

cbrauers commented 4 years ago

Did some simulations. Even with the non-systematic coding and an inner interleaver, 16% of incorrigible three-bit errors sneak past the magic number test. However, the unprotected CRC also causes 34% of three-bit errors to be rejected, when only 7% are incorrigible, essentially quintupling the rejection rate for the most common type of incorrigible error. You obviously get a division by zero for single and double errors, which are always corrigible except if they occur in the CRC. Which they do 11% (single) / 20% (double) of the time.

So there is just no substitute to an FEC-protected checksum. Which I have not yet simulated as I need to create my own packets for this. Since that would drop the magic number, I think I will also kill the non-systematic decoder, the overly-complex MATLAB implementation of GF(2) polynomial division makes that part of the simulation take up 90% of the simulation's time.

CFSworks commented 4 years ago

16% sneak through - wow, that's worse than the ~7% expected for a true 4-bit CRC. I guess the length of the BCH polynomial really makes it behave more like a truncated CRC-10 (and with a polynomial not meant for CRC, no less). I guess it's mitigated a little bit by the fact that under 6% of three-bit errors are actually incorrigible. Still nowhere near as good as a bona fide CRC-16 (inside the FEC or otherwise).

Nintendo's definitely not concerned with rejection rate; the ratio of transmissions to frames is sufficiently high that a rejected message won't meaningfully degrade performance. Even if the CRC16 were protected, the 11%/20% figure should still apply as the 16-bit start sequence isn't protected either (and it's higher than that in the released product because there are 32 sensitive bits per message that can cause rejection, not 16). I wondered about making a receiver that would accept CRC16s and start sequences that had 1 bit error but wasn't sure if a 17x increase in (accepted) error rate was worth it.

As long as you're doing simulations with CRC on the inside, I'm curious if systematic vs. non-systematic has any effect on the percentage of random messages that are accepted. I would intuitively say no, since the "landslide effect" from non-systematic polynomial division is just as likely to make an invalid CRC valid as it is to make a valid CRC invalid, but it's still a hypothesis worth checking. (To speed things along, maybe multiply by g(x)'s reciprocal rather than do division?)

cbrauers commented 4 years ago

Not done the sim yet, but I am fairly certain that the false negative rate will be immeasurably low.

I actually managed to speed up the non-systematic calculation quite a bit thanks to your comment. I figured that by doing the convolution instead of a deconvolution, I could just do it on the real numbers with an odd-or-even decision at the end. But then I remembered: They did that for deconvolution in a newer Matlab version - completely ignoring the problem of floating-point rounding error as real deconvolutions have a tendency to diverge pretty drastically, breaking BCH codes with n>511 pretty reliably, and occasionally also n=511. But we are dealing with n=31 here, so I just used real deconvolution, with doubles, as MATLAB likes it (hooray, represent a finite field with floating point), and it has been quite the speedup.