aws / s2n-tls

An implementation of the TLS/SSL protocols
https://aws.github.io/s2n-tls/usage-guide/
Apache License 2.0
4.53k stars 709 forks source link

Use of uninitialized memory in s2n_aead_aes_test #150

Closed pascal-cuoq closed 9 years ago

pascal-cuoq commented 9 years ago

Here are the contents, in symbolic form, of the char array pointed to by conn->in.blob.data at the time when the line https://github.com/awslabs/s2n/blob/09261380262fca884567b941897ae3f095f55aad/tests/unit/s2n_aead_aes_test.c#L154 is reached:

[0..7] ∈ {0}
[bits 64 to 127] ∈ {4533304577153303563}
[bits 128 to 191] ∈ {1703996142823241576}
[24] ∈ {62}
[25] ∈ {31}
[26] ∈ {98}
[27] ∈ {216}
[bits 224 to 287] ∈ {10729224154281305815}
[bits 288 to 351] ∈ {3412835101196934063}
[44..1434] ∈ UNINITIALIZED

In the symbolic representation of the contents above, some values are integers wider than 8-bit. These values are represented as bit ranges, e.g. “bits 64 to 127”. When “bits” does not appear, it means the indices can be interpreted as indices into a char array. In other words, the first 44 bytes of that dynamically allocated memory zone are initialized to some numbers, and the rest is uninitialized, up to the index 1434.

conn->in.blob.size contains 1435, which is consistent with the size of the zone pointed to by conn->in.blob.data.

The line 154 attempts to access the last byte (and then the 15 bytes before that) with respect to the end of the character array. These characters are uninitialized. I am surprised that the test even works as you wanted.

Perhaps you mean to do something to the last bytes of the message, which in this case seems to be 44-byte long?

One word of warning: the tool we use stops at the first error (it is theoretically impossible to predict what happens in a C program after the first undefined behavior), so the same code pattern leading to using uninitialized memory may happen again later in the test.

Off-topic: in https://github.com/awslabs/s2n/issues/146 there was the question:

I wonder why -Wuninitialized doesn't catch it.

I wouldn't expect a compiler to warn about the following program, where the local variable being left uninitialized and the function where it is used are different.

void g(int *p) {
  static int s;
  s = 1 + *p;
}

int main(){
  int l;
  g(&l);
}

There are analyses that can find this use of uninitialized memory across function calls, but these are more expensive than a compiler can afford to run in the few seconds the user allows it when the user launches compilation.

In the issue 146, the two functions were in different files, so because of separate compilation, even an ideal compiler that somehow could afford this kind of analysis would not be able to warn for either file.

ariccio commented 9 years ago

One word of warning: the tool we use stops at the first error

What tool is this? A custom tool? I'm quite curious. (is the "word of warning" supposed to be a pun of some sort?)

There are analyses that can find this use of uninitialized memory across function calls, but these are more expensive than a compiler can afford to run in the few seconds the user allows it when the user launches compilation.

A bit of a sidenote: This is a active area of research in static analysis tools.

Microsoft built SAL for precisely this purpose, as annotations turn global analysis into local analysis.

Your example, aside from the fact that g's argument should be const int*, with annotations:

void g(_In_ int *p) {
  static int s;
  s = 1 + *p;
}

int main(){
  int l;
  g(&l);
}

...the _In_ signifies that p should be valid when g is called - the pointer can't be NULL, and it must point to initialized memory.

I've (personally) seen SAL pick up specifically that kind of bug hundreds of times. I'm not kidding, it's soon to be "thousands"!

SAL really shines in tricky APIs that handle buffers, where buffer size is not formally coupled to the buffer itself (the classic void fill_buffer(char* buffer, rsize_t bufferSize )), or where the buffer is null-terminated (no formal way of differentiating a char* from a null-terminated char*), as it doesn't need to read documentation or see the source to detect misuse. I've not yet seen it be confused by separate compilation units, calls to external libraries, or system calls, so long as the declarations are properly annotated.

OSR, known for kernel-mode development, has a great article on the use of SAL in their file encryption library, which has a classically tricky C API.

/analyze with SAL is extremely good at detecting misuse of annotated functions. It's not without overhead, but still faster and far smarter than raw global analysis.

Several years ago, Clang Devs thought about adding support for SAL, but a software patent scared them off. Despite that, Clang now has a set of annotations for thread safety analysis, which are nearly identical to their SAL counterparts. It's a shame, because fragmentation will prevent the widespread adoption that the industry deserves.

Self promoting plug: "Preventing bugs, and improving code quality with Microsoft SAL"

pascal-cuoq commented 9 years ago

Self promoting plug

Yes, it seems to me.

I mean seriously dude, you realize that the example you decided to riff on was simplified in the context of a discussion of compiler warnings, right? The real issues were https://github.com/awslabs/s2n/issues/146 and the 150 one described above, which simplistic annotations do not help find even if you assume that the programmer is willing to write them. That's a lot of URLs and promotion, for a system that does not find the issues being discussed.

ariccio commented 9 years ago

I mean seriously dude, you realize that the example you decided to riff on was simplified in the context of a discussion of compiler warnings, right?

Yup. I think it's pretty cool, and you're in no way obliged to agree.

The real issues were #146 and the 150 one described above, which simplistic annotations do not help find even if you assume that the programmer is willing to write them.

Yeah, they can be annoying, ugly too.

That's a lot of URLs and promotion, for a system that does not find the issues being discussed.

Four links, one of which is to self-published material? I had no intention of irritating, and apologize that I did.

My intent was to contribute to to mutually enriching intellectual environment that is open source software development.