Closed GordonSmith closed 3 months ago
Note 1: There is also a bug in the removed code:
if worst_case_size > len(encoded):
ptr = cx.opts.realloc(ptr, worst_case_size, 1, len(encoded))
trap_if(ptr + len(encoded) > len(cx.opts.memory))
When realloc is called with a smaller size to the current ptr, you can NOT assume that same address is returned (well in c++ guests anyway)...
Note 2: This has a nice upstream benefit as it results in a bunch of code removal?
The expected native implementation (that the Python is just describing the observable effects of) won't know len(encoded)
(which I think is what you mean in your suggested code with enc_len
) until the end of the copy. The goal of this whole routine is to avoid multiple passes over the string and thus we can't depend on len(encoded)
in the way that your suggested code requires. With the spec as written, the implementation can simply blindly copy UTF-8 bytes until the initial allocation is full and only then resize to the worst case (which doesn't require knowing the exact final size).
The final shrinking realloc acknowledges the fact the realloc
is allowed to change pointer location (hence the leading ptr = realloc(....)
).
Sorry for the noise - forgot the existing contents will get moved as part of the realloc, sorry for the noise.
Currently it is:
But could be shortened to (no need for worst case length):