Open kees opened 4 years ago
What is to be done about cases where the destination buffer is length-bounded (not NUL-terminated) and its size is not known by the compiler (i.e: the macros strtomem...()
fail)
There are some pathological cases in net: dsa
such as: net/dsa/slave.c:dsa_slave_get_strings
which seems to populate a length-bounded array with string literals + padding. There is some heavy indirection (via function pointers and mismatched casts) that make it hard to determine whether or not the dest
buffer will have its size properly computed by __builtin_object_size()
.
If dest
cannot have its sized computed by the compiler then the macros strtomem...()
will fail (by design).
Take this code, for example, from the case mentioned above:
static void dsa_slave_get_strings(struct net_device *dev,
uint32_t stringset, uint8_t *data)
...
int len = 32;
strncpy(data, "tx_packets", len);
strncpy(data + len, "tx_bytes", len);
strncpy(data + 2 * len, "rx_packets", len);
strncpy(data + 3 * len, "rx_bytes", len);
...
Simply swapping these calls to the appropriate strtomem_pad
calls yields a BUILD_BUG_ON
asssertion failure. What is the next best option to rid strncpy
uses in cases like these (or do we leave them be)?
tl;dr: based on the decision tree provided by OP there exists some cases (mentioned above) where the destination has both qualities 1) length-bounded and 2) impossible for the compiler to determine the size of
┌───────────────────────┐
│Is dest NUL-terminated?│
└──────────┬────────────┘
│
│
N │ Y
┌────────┴───────┐
▼ ▼
┌──────────────────────┐ ...
│dest requires padding?│
└──────────┬───────────┘
│
N │ Y
┌────────┴───────┐
▼ ▼
┌───────────────┐ ┌──────────────────┐
│ use strtomem()│ │use strtomem_pad()│
└──────┬────────┘ └─────────────────┬┘
│ │
▼ ▼
... ┌────────────────────────────────┐
│dest size compiler-determinable?│
└──────────────┬─────────────────┘
N │ Y
┌──────────┴──────┐
▼ ▼
┌─────┐ ┌────┐
I am here ---> │Uh,Oh│ │ OK!│
└─────┘ └────┘
Somewhat Reviewed by: @nickdesaulniers
After some tinkering, I think I've found a viable solution.
If there existed some strntomem_pad
(name tentative) which accepted dest_len
as a macro parameter then use cases where the dest
buffer's length is not determinable can simply provide dest_len
.
#define strntomem_pad(dest, dest_len, src, pad) \
memcpy_and_pad(dest, dest_len, src, strnlen(src, dest_len), pad);
Here's how it would look in action tackling net/dsa/slave.c:dsa_slave_get_strings
:
static void dsa_slave_get_strings(struct net_device *dev,
uint32_t stringset, uint8_t *data)
...
int len = 32;
strncpy(data, "tx_packets", len);
strncpy(data + len, "tx_bytes", len);
strncpy(data + 2 * len, "rx_packets", len);
strncpy(data + 3 * len, "rx_bytes", len);
...
turns into
static void dsa_slave_get_strings(struct net_device *dev,
uint32_t stringset, uint8_t *data)
...
int len = 32;
strntomem_pad(data, len, "tx_packets", 0);
strntomem_pad(data + len, len, "tx_bytes", 0);
strntomem_pad(data + 2 * len, len, "rx_packets", 0);
strntomem_pad(data + 3 * len, len, "rx_bytes", 0);
...
Testing reveals these are exactly equivalent (in my isolated godbolt testing)
and thus the new "Remove all strncpy() uses" decision tree looks as follows:
┌───────────────────────┐
│Is dest NUL-terminated?│
└──────────┬────────────┘
│
│
N │ Y
┌────────┴───────┐
▼ ▼
┌──────────────────────┐ ...
│dest requires padding?│
└──────────┬───────────┘
│
N │ Y
┌────────┴───────┐
▼ ▼
┌───────────────┐ ┌──────────────────┐
│ use strtomem()│ │use strtomem_pad()│
└──────┬────────┘ └─────────────────┬┘
│ │
▼ ▼
... ┌────────────────────────────────┐
│dest size compiler-determinable?│
└──────────────┬─────────────────┘
N │ Y
┌──────────┴───────────────┐
▼ ▼
┌─────────────┐ ┌────┐
✅ I am here ---> │strntomem_pad│ │ OK!│
└─────────────┘ └────┘
Is this a viable solution?
Depending on the human to get the length argument correct is a new foot-gun that we should work very hard to avoid.
Yeah, this is a new ugly case you've found (unknown length and not %NUL terminated).
A few thoughts about the specific example:
ETH_GSTRING_LEN
-sized buffers.
This internal API looks very fragile. The get_strings
handlers all have to trust that enough memory has been allocated. 😡
Anyway, I suspect the best fix here is to teach the handlers the size of the buffers in some way.
Given that the strings in question will not be truncated and other get_strings
implementations already use strscpy
and the buffer is already zero-filled (vzalloc
), I think just swapping the strncpy
calls here with strscpy
is fine. Refactoring the entire get_strings
API to pass allocation size looks unpleasant, but would probably find real bugs.
@kees Thoughts on this article:
tl;dr: Linus doesn't want large swaths of strncpy -> strscpy
patches.
Linus was clearly afraid that the strscpy() patch could end up being a source of regressions as well. That wouldn't happen with the patch set itself, which does not convert any existing strncpy() or strlcpy() call sites. The problem happens when other, well-intentioned developers start doing those conversions. Linus described his worries in the merge commit that brought in strscpy():
So why did I waffle about this for so long? Every time we introduce a new-and-improved interface, people start doing these interminable series of trivial conversion patches.
And every time that happens, somebody does some silly mistake, and the conversion patch to the improved interface actually makes things worse. Because the patch is mindnumbing and trivial, nobody has the attention span to look at it carefully, and it's usually done over large swatches of source code which means that not every conversion gets tested.
To try to head off such an outcome, Linus has made it clear that he will not be accepting patches that do mass conversions to strscpy() (note though that certain developers are already considering mass conversions anyway). It is there to be used with new code, but existing code should not be converted without some compelling reason to do so — or without a high level of attention to the possible implications of the change.
One might be tempted to think that this proclamation from Linus signals the end of the "trivial clean-up patch" era. But that would almost certainly be reading too much into what he said. Patches that do not make functional changes to the code do not, one would hope, pose the same sort of risk that API replacements do. So the flow of white-space adjustments is likely to continue unabated. But developers who want to convert a bunch of working code to a "safer" interface may want to think twice before sending in a patch.
@kees Thoughts on this article:
https://lore.kernel.org/lkml/202307121703.D2BE6DFEE@keescook/
tl;dr: we won't do treewide changes (instead we do it individually ) and we need to do conversions with a very regular process for performing and validating the transformations. (This is how we've approached the flexible array transformations as well.)
The
strncpy()
function is actively dangerous to use since it may not NUL-terminate the destination string, resulting in potential memory content exposures, unbounded reads, or crashes. Replacing uses requires some careful attention, though, sincestrncpy
gets used also for two other cases:NLA_STRING
where the length stored separately, andNLA_NUL_STRING
which uses a "traditional" NUL-terminated string. For the cases wherestrncpy()
is used to copy non-NUL-terminated strings, the destination buffer needs to be marked with the__nonstring
attribute, so that compiler diagnostics will avoid warning about cases where the character array is considered NUL-terminated.strscpy_pad()
, ormemcpy_and_pad()
when the destination is length-bounded.https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
The workflow to replace
strncpy()
, therefore, needs to be:kstrndup()
strscpy_pad()
strscpy()
__nonstring
attribute and:memcpy_and_pad(dst, sizeof(dst), src, min(sizeof(dst), strnlen(src, sizeof(dst)), pad_char)
(perhaps this needs a macro)memcpy(dst, src, min(sizeof(dst), strnlen(src, sizeof(dst))
(perhaps this needs a macro)One can find instances to replace using this search:
git grep '\bstrncpy\b' | grep -vE '^(Documentation|tools|samples|scripts|arch/.*/lib)/' | grep -vE 'fortify|include/linux/string.h|lib/string.c'