Closed vstinner closed 1 month ago
So, my 2 cent after Victor explained to me how it's being used: it's good and helpful..
I have one question though. Is there a preference of having
PyBytesWriter_Create(Py_ssize_t size, char **str);
instead of
PyBytesWriter_Create(char **str, Py_ssize_t size);
?
Since I'm on a phone, it's hard to check the files but shouldn't we mimic the usual order of having the "output" variable on the left? (for instance the Prepare method also puts the str before its length).
shouldn't we mimic the usual order of having the "output" variable on the left?
Ah? For me, the trend in the Python C API is more to put output parameters at the end ("right").
Ah? Er... actually it's probably the case for the C API now that I think about with all the Getters functions.
I think I used the mem/alloc C functions as the baseline and thought as the Create as something close to the realloc() function. I don't really have a strong opinion on the matter though so let's keep your signature.
We probably should have a public discussion on Discourse before getting to the WG decision.
AFAIK, there are two main use cases for a factory like this:
bytes
. (As in the PyLongWriter
proposal.)For the first use case, IMO an efficient implementation should mean that Create
does one memory allocation, and Finish
does none (it turns the writer to bytes
in place).
For the second, why not use bytearray
, which already is explicitly mutable?
It seems that the advantage of a non-PyObject
is that it can only have one “owner”, so:
str
pointer. (The docs for the new public API should be very explicit about how long str
is valid.)It seems to me that PyBytesWriter
is close to a generic “growable buffer”. Is there a special reason that it's tied to PyBytes
? From the API it seems that Finish
could be replaced by passing str
to PyBytes_FromStringAndSize
+ calling Discard
-- and then you could do the same to make a bytearray
or any other type.
The private API has an optimization that we lose here: it puts a small initial buffer on the stack. It seems that performance-minded users will want to use their own implementation anyway.
Also, C++, Rust, etc. surely have a growable buffers already; they should be fine with PyBytes_FromStringAndSize
In CPython itself, there are quite few uses of _PyBytesWriter
.
Is it worth exposing at all?
@encukou:
For the second, why not use bytearray, which already is explicitly mutable?
It's slower since you have to copy bytearray memory to create a bytes object, and destroy the bytearray, to create the final bytes object. PyBytesWriter works in the nanoseconds area.
One could expect that it's not thread-safe.
Correct, the code is not thread safe, and expected to have a single user.
It seems to me that PyBytesWriter is close to a generic “growable buffer”. Is there a special reason that it's tied to PyBytes? From the API it seems that Finish could be replaced by passing str to PyBytes_FromStringAndSize + calling Discard -- and then you could do the same to make a bytearray or any other type.**
PyBytesWriter contains a temporary bytes object. The purpose of the API is to only expose the bytes object when it's fully initialized.
The private API has an optimization that we lose here: it puts a small initial buffer on the stack.
The small buffer is also used in the proposed public API. It's just that it's allocated on the heap memory.
Also, C++, Rust, etc. surely have a growable buffers already; they should be fine with PyBytes_FromStringAndSize
The problem of using a buffer and then calling PyBytes_FromStringAndSize() is that you have to copy the memory, and then destroy the temporary buffer. Again, PyBytesWriter uses a bytes object internally, so calling PyBytesWriter_Finish() is cheap.
PyBytesWriter works in the nanoseconds area.
A more concrete micro-benchmark shows that the public PyBytesWriter (23.4 ns) is 4x faster than bytearray (93.0 ns): https://github.com/python/cpython/pull/121726#issuecomment-2273053405.
Thank you, I see now! Your explanations help with reading the implementation.
A hard part of making this public will be documenting what str
is, when it is valid and how much of the memory it points to is valid. It's a bit hard to explain :)
When you call PyBytesWriter_Create(size, &str)
, you can write size
bytes into str.
When you call PyBytesWriter_Prepare(writer, &str, size)
, you can write size
bytes into str.
str is valid until PyBytesWriter_Finish() or PyBytesWriter_Discard() is called. Would you prefer that these functions set str to NULL
?
That, and the str
you pass to PyBytesWriter_Create
or …Finish
needs to be the str
you got from …Create
or …Prepare
, incremented by (up to?) the size
given to that call. Right?
Would you prefer that these functions set str to
NULL
?
Probably not necessary, they don't clear writer
either.
That, and the str you pass to PyBytesWriter_Create or …Finish needs to be the str you got from …Create or …Prepare, incremented by (up to?) the size given to that call. Right?
Correct.
Except you can call Prepare()
several times in succession. I'm not entirely clear on what should happen in that case.
I'd expect that each Prepare
reserves size bytes after *str, overriding any previous Prepare
.
Instead, AFAICS each Prepare
adds size bytes to an internal “minimum size” field. IMO, that's harder to explain. But more importantly, I'm not sure on what you should do if you call Prepare
with a size that's too big -- is there a good way to “decrement” that if it turns out you needed fewer bytes than you Prepare
d? Or are users expected to provide exact sizes (which might be expensive to calculate?) The current uses seems to rely on API that's not exposed here, like writer.min_size -= 2
.
I find the Prepare
API confusing. From the API names in the OP, it is not entirely obvious to me how I'm supposed to use the proposed APIs.
I find the Prepare API confusing. From the API names in the OP, it is not entirely obvious to me how I'm supposed to use the proposed APIs.
Oh. Can you propose a better API in this case?
Prepare(n)
adds n
bytes to the internal buffer. Example:
PyBytesWriter_Create(3)
allocates a buffer of 3 bytesPyBytesWriter_Prepare(&writer, &str, 3)
reallocates the buffer to 6 bytesPyBytesWriter_Prepare(&writer, &str, 4)
reallocates the buffer to 10 bytesFor naming, Why not simply
It will be the less confusing :)
For some brainstorming --
I would expect that Prepare
looks at the current position, i.e. str
, like this:
Prepare(n)
clears any previous reservations, and ensures n
bytes can be written after the current str
:
PyBytesWriter_Create(3)
allocates a buffer of 3 bytesstr += 3
PyBytesWriter_Prepare(&writer, &str, 3)
reallocates the buffer to 6 bytesstr += 2
// str
is now (<start of buffer> + 5)
PyBytesWriter_Prepare(&writer, &str, 4)
reallocates the buffer to 9 bytesPyBytesWriter_Prepare(&writer, &str, 2)
could reallocate the buffer to 7 bytes (but in practice we only shrink in _Finish
, so it stays at 9)That said, this is still a weird API; PyUnicodeWriter
, where the “current position” is handled entirely by the writer, is much more understandable.
The proposed API allows user code to keep track of the “current write position” using str
(like str += 3
); but str
is also a pointer that's updated by the writer's realloc calls. I think this dual usage is confusing.
Why not let the user handle the “current write position” on their own, as an offset from “start of buffer” that's handled by the writer, and have for example:
// Create a bytes writer instance.
// Set `*str` to a writable buffer of at least the given `size`.
// `*str` is valid until the next `PyBytesWriter_*` call on the result.
PyBytesWriter* PyBytesWriter_Create(Py_ssize_t size, char **str);
// Reallocate to `size` bytes.
// Set `*str` to a writable buffer of at least the given `size`;
// `min(<old_size>, size)` bytes are copied to this buffer.
// `*str` is valid until the next `PyBytesWriter_*` call on this `writer`.
// (Detail: this will never shrink the internal buffer, but if it needs to grow,
// it will not overallocate. The user can choose an overallocation strategy
// themself.)
int PyBytesWriter_Realloc(PyBytesWriter *writer, Py_ssize_t size, char **str);
// Return the final Python bytes object with the given size,
// and destroy the writer instance.
// `size` must not be greater than in the previous `_Create`/`_Realloc` call.
PyObject* PyBytesWriter_Finish(PyBytesWriter *writer, Py_ssize_t size);
// Discard the internal bytes buffer and destroy the writer instance.
void PyBytesWriter_Discard(PyBytesWriter *writer);
That would make the convenience helper slightly more complicated, though, as it needs to update the “current write position”:
// Write `data_size` bytes from `data` at the given `*offset`.
// Set `*str` to a writable buffer of at least `*offset + data_size` bytes;
// `*offset` bytes are copied to this buffer.
// Increment `*offset` by `data_size`.
// Equivalent to:
// - PyBytesWriter_Realloc(writer, *offset + data_size, str)
// - memcpy((*str) + offset, data, data_size)
// - *offset += data_size
// (Detail: unlike PyBytesWriter_Realloc, this *will* overallocate, so
// creating bytes using `PyBytesWriter_Write` would be amortized `O(N)`.)
void PyBytesWriter_Write(PyBytesWriter *writer, char **str, Py_ssize_t *offset, Py_ssize_t data_size, char *data);
But, looking at the big picture, I see only two use cases:
PyBytesWriter* PyBytesWriter_Create(Py_ssize_t size, char **str);
PyObject* PyBytesWriter_Finish(PyBytesWriter *writer);
void PyBytesWriter_Discard(PyBytesWriter *writer)
.PyBytesWriter_Finish
would likely need to reallocate to the final size, and copy data. In this case, for the preparation you can use any “generic resizable buffer” implementation; Python doesn't need to provide one for you. Pass the result to PyBytes_FromStringAndSize
.Petr:
That said, this is still a weird API; PyUnicodeWriter, where the “current position” is handled entirely by the writer, is much more understandable.
I agree. I worry that the complexity of this API will lead to misuse.
Victor [regarding my complaints about the "prepare" API]:
Oh. Can you propose a better API in this case?
I don't know yet. I can try to think up something.
It seems like this API is too low-level and too error-prone. I prefer to abandon promoting this API as a public API for now. We can revisit this API later if needed.
This API is basically an efficient memory manager for functions creating Python bytes objects.
It avoids the creation of incomplete/inconsistent Python bytes objects: allocate an object with uninitialized bytes, fill the object, resize it. See issues:
API:
The str parameter is what's being used to write bytes in the writer.
Example creating the string "abc":
The "hot code" which writes bytes can only use str.
The
PyBytesWriter_WriteBytes()
is just a small convenient helper function (forPyBytesWriter_Prepare()
+memcpy()
calls).You can find more documentation and examples in the Pull Request.
This proposed public API is based on existing
_PyBytesWriter
API which is quite old. I wrote an article about it in 2016: https://vstinner.github.io/pybyteswriter.html