antirez / sds

Simple Dynamic Strings library for C
BSD 2-Clause "Simplified" License
4.89k stars 473 forks source link

Question: Violation of Strict Aliasing Rule ? #130

Open Ursescu opened 4 years ago

Ursescu commented 4 years ago

When a new sbs is created (as a char array) it is casted to struct type pointer, ex. SDS_HDR(8,s);. Isn't this violating the strict aliasing rule? (Dereferencing a pointer that aliases an object that is not of a compatible type or one of the other types allowed by C 2011 6.5 paragraph 71 is undefined behavior)

Example:

sh = s_malloc(hdrlen+initlen+1);
...
s = (char*)sh+hdrlen;
...
SDS_HDR_VAR(8,s);
sh->len = initlen;
sh->alloc = initlen;

#define SDS_HDR_VAR(T,s) struct sdshdr##T *sh = (void*)((s)-(sizeof(struct sdshdr##T))); #define SDS_HDR(T,s) ((struct sdshdr##T *)((s)-(sizeof(struct sdshdr##T))))

oranagra commented 4 years ago

@Ursescu thanks for reaching out. I'm no expert in standards, but can you help me understand what exactly you're referring to. As far as i understand char* casting is allowed. We do not violate memory alignment concerns. And we don't access the same portion of memory with different types of casting (being exposed to endianess or padding issues).

Ursescu commented 4 years ago

One pointer is said to alias another pointer when both refer to the same location or object.

In the code the sh is a char pointer. And SDS_HDR(8,s); will cast this pointer (char) to a another type of pointer (struct sdshdr8 in this example). The memory referred to by the first sh is an alias of second sh (from the macro) because they refer to the same address in memory. In C99, it is illegal to create an alias of a different type than the original. This is referred to as the strict aliasing rule. Take a look at gcc -fstrict-aliasing from the Optimize Options (4.9.1)

Further digging:

C99 - 6.5 Expressions:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

-     a type compatible with the effective type of the object,
-     a qualified version of a type compatible with the effective type of the object,
-     a type that is the signed or unsigned type corresponding to the effective type of the object,
-     a a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
-     an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
-     a character type.
The intent of this list is to specify those circumstances in which an object may or may not be aliased.

So in other words it is allowed to alias a struct pointer with char * pointer, but this won't work the other way: there's no assumption that your struct aliases a buffer of chars.

oranagra commented 4 years ago

Ohh, So that's about compiler optimizations re-ordering statements or caching memory in registers. and that a write to one pointer may not be visible when reading from another pointer pointing to the same memory. right?

I'll need to look deeper when i get a chance. If you happen to have a solution, please make a PR to https://github.com/redis/redis