Safer buffers - Githubissues

henrygab commented 3 months ago

Hitag2 -- complex decoder was left alone, and instead the fix is a hack ... allocate one extra byte to allow the one-byte buffer overrun to occur safely.
HitagS -- fix potential one-byte overrun
lfsampling -- fix potential one-byte overrun
Add safer array size calculation macro (from SimpleHacks) ... which has zero net binary effect, but avoids accidental errors, such as refactoring resulting in accidentally taking the length of a pointer (rather than the length of an array).

github-actions[bot] commented 3 months ago

You are welcome to add an entry to the CHANGELOG.md as well

henrygab commented 3 months ago

I manually squashed to a single commit, to avoid having non-compiling commits in commit history (better for git bisect)

iceman1001 commented 3 months ago

OK,
the modification to use a more complex array length calculation for potential wrong calls to the old macro is unneeded. The coverity scan and cppchecker picks up on those. We don't have those mistakes.

I would say you use your code out of habit, Please revert that addition.

henrygab commented 3 months ago

Respectfully, I do not submit this change out of habit, but as a way to improve the continued stability of the codebase and the experience of contributors to the codebase.

Previously, it seemed you and I shared the desire to catch errors at the earliest possible time. Listing those times in what I believe is an agreed order of preference:

At edit (e.g., where an editor flags something as a mistake)
At compilation (e.g., causing a compile-time error)
At link time (e.g., rare, but examples exist)
Via static analysis (e.g., coverity scan / cppchecker ... not always run by developers)
At run-time

Here, if everything is fine, and all developers are perfect, the safer macro has no visible effect. If, however, someone makes a mistake, the safer macro calls it out at compilation time, and saves everyone time and effort (and frustration).

More explicit reasons

At present, you indicate "we don't have those mistakes". For the sake of argument, I **_accept_** that the code base does not contain this bug. Even so, the safer macro is a worthwhile change. This is **_especially_** true because this codebase receives contributions from many people, of varied skill in coding. (it's an awesome community!) It's clear we also both agree that static analysis is an excellent tool, and I applaud the use of Coverity and CppChecker. Of course, I also think we both know most users don't run Coverity or CppChecker. Therefore, at best, any error caught by these tools gets cauth **_after_** the code has been checked into the depot, and the static analysis is run, and the results analyzed. **The safer array length macro _improves this situation_**. This error is **_guaranteed_** to not be in the code base, not because an optional static analysis tool was run, but because it would not even compile if the error was there. If this only got to parity with the static analysis tools, maybe it's not worth it. **The safer array length macro is _better for contributors_**.

Long example, but based on true fact patterns

As an example, let's say someone realizes the same code block exists in a bunch of places. Maybe it looks something like: ```C uint8_t results[16]; memset(results, 0, ARRAYLEN(results)) uint8_t buffer[16+2]; // e.g., some header memset(buffer, 0, ARRAYLEN(buffer)); result = call_some_function(buffer, ARRAYLEN(buffer)); /* ... lots of common error handling, retry, etc. code ... */ if (SUCCESSFUL(result)) { memcpy(results, buffer+2, ARRAYLEN(results)); } ``` Rather than fixing a bug in that error handling code in each of the four places this code was replicated, the user does a "smart" thing and refactors it so the common code is shared. Maybe it looks something like: ```C bool common_stuff(uint8_t results[16]) { memset(results, 0, ARRAYLEN(results)); uint8_t buffer[16+2]; // e.g., some header memset(buffer, 0, ARRAYLEN(buffer)); result = call_some_function(); /* ... lots of common error handlings, retry, etc. code ... now with bugfix */ if (SUCCESSFUL(result)) { memcpy(results, buffer, ARRAYLEN(results)); } return SUCCESSFUL(result); } ``` and the caller spot would look like: ```C uint8_t buffer[MAX_BUFFER_SIZE]; if (common_stuff(buffer)) { /* ... does things with data in buffer...*/ } ``` Nice! This type of refactoring looks reasonable. The code compiles, the diff is tiny because most of the lines are identical between old code and the new function. It often **_even runs successfully_**. Unfortunately, this type of bug has caused multiple CVEs. (_If you don't catch the problem, you're not alone. Many code reviewers do not unless exposed to this before...._) Information disclosure is only the most obvious type. As you likely caught, an array parameter is a pointer. Therefore, the user suddenly was only getting four (or eight) bytes of "real" data, while the remaining twelve bytes are "bad data" ... typically, whatever happened to be on the stack. The bad data shows up in the output somewhere. A non-expert developer would review (and re-review) the code, and not see where this bad data comes from, eventually wondering how the above function fails to copy all 16 bytes of data. It becomes a painful debugging experience, all because an error that could have been caught at compilation time was not. In a best case scenario, this all happens **_before_** the code enters upstream / main.

My comments come from decades of experience developing software, and watching problems that even experts make, and the debugging pain, lost time, and wasted effort it has caused. It's still legal in some states in the USA to ride a motorcycle without a helmet, or a car without using seatbelts. That doesn't make doing those things the best choice.

Static analysis is AWESOME, but does not catch problems as early. Let's save the contributors some headache, and leave the safer `ARRAYLEN()` macro in the depot.

iceman1001 commented 3 months ago

When it comes this particular project, it has three major parts.

FPGA source Code
ARM Source Code
CLIENT Source Code

When it comes to doing changes and adding things, here is the rules I have set down.

The client , we can do anything, as long as it compiles without any warnings and errors on all the multiple platforms and that of course doesn't break anything. No problem adding your modified ARRAYLEN.

ARM source code, here its timing critical and we don't have the luxury to add nice checks everywhere just because its nice coding pattern. Usually pre-setup code and post-execution code can be freely modified. The biggie is everything inside main loop of a function. It brews always down to "don't mess with things without very much testing" With testing, its physical cards, genuine readers, simulated devices, you name it.

You can argue as much as you want but anything in the ARM source code will be scrutinized and rejected.

For the fun of it , I tested this PR in my recent research into hitag2. I now have a harder time sniffing.
This means this PR is not going to be merged at all as is. It also very well describes this issue with fiddling in the ARM code.

Did you do any real tests with this PR?

With that said I have no issue with modifying the client. Go ahead, no problem at all.

henrygab commented 3 months ago

Thank you for the additional clarity on how you see the split. I don't know that I agree with the characterization of preventing buffer overruns as "nice coding pattern", but I will respect your decision to prefer to leave the risky code "as-is". After all, the checks to prevent buffer overrun are inherently within the critical timing loops you refer to, so it will be impossible to prevent 2-4 CPU cycles for a test/jmp/increment. Moreover, I do not have many of the physical cards, nor any genuine readers, nor test with simulated devices. According, I will close this PR.

RfidResearchGroup / proxmark3

Safer buffers #2339