Add PyBytes_Writer() API

vstinner commented 1 month ago

This API is basically an efficient memory manager for functions creating Python bytes objects.

It avoids the creation of incomplete/inconsistent Python bytes objects: allocate an object with uninitialized bytes, fill the object, resize it. See issues:

API:

// Create a bytes writer instance.
PyBytesWriter* PyBytesWriter_Create(Py_ssize_t size, char **str);

// Return the final Python bytes object and destroy the writer instance.
PyObject* PyBytesWriter_Finish(PyBytesWriter *writer, char *str);

// Discard the internal bytes buffer and destroy the writer instance.
void PyBytesWriter_Discard(PyBytesWriter *writer);

// Allocate *size* bytes to prepare writing *size* bytes into *writer*. 
int PyBytesWriter_Prepare(PyBytesWriter *writer, char **str, Py_ssize_t size);

// Write a the bytes string *bytes* of *size* bytes into *writer*.
int PyBytesWriter_WriteBytes(PyBytesWriter *writer, char **str, const void *bytes, Py_ssize_t size);

The str parameter is what's being used to write bytes in the writer.

Example creating the string "abc":

static PyObject* create_abc(void)
{
    char *str;
    PyBytesWriter *writer = PyBytesWriter_Create(3, &str);
    if (writer == NULL) return NULL;

    // Write "abc"
    memcpy(str, "abc", 3);
    str += 3;

    return PyBytesWriter_Finish(writer, str);
}

The "hot code" which writes bytes can only use str.

The PyBytesWriter_WriteBytes() is just a small convenient helper function (for PyBytesWriter_Prepare() + memcpy() calls).

You can find more documentation and examples in the Pull Request.

This proposed public API is based on existing _PyBytesWriter API which is quite old. I wrote an article about it in 2016: https://vstinner.github.io/pybyteswriter.html

picnixz commented 1 month ago

So, my 2 cent after Victor explained to me how it's being used: it's good and helpful..

I have one question though. Is there a preference of having

PyBytesWriter_Create(Py_ssize_t size, char **str);

instead of

PyBytesWriter_Create(char **str, Py_ssize_t size);

?

Since I'm on a phone, it's hard to check the files but shouldn't we mimic the usual order of having the "output" variable on the left? (for instance the Prepare method also puts the str before its length).

vstinner commented 1 month ago

shouldn't we mimic the usual order of having the "output" variable on the left?

Ah? For me, the trend in the Python C API is more to put output parameters at the end ("right").

picnixz commented 1 month ago

Ah? Er... actually it's probably the case for the C API now that I think about with all the Getters functions.

I think I used the mem/alloc C functions as the baseline and thought as the Create as something close to the realloc() function. I don't really have a strong opinion on the matter though so let's keep your signature.

encukou commented 1 month ago

We probably should have a public discussion on Discourse before getting to the WG decision.

AFAIK, there are two main use cases for a factory like this:

Allocate writer of known size, then immediately fill it, and convert to bytes. (As in the PyLongWriter proposal.)
Build the object incrementally, which involves over-allocation and re-allocation.

For the first use case, IMO an efficient implementation should mean that Create does one memory allocation, and Finish does none (it turns the writer to bytes in place).

For the second, why not use bytearray, which already is explicitly mutable? It seems that the advantage of a non-PyObject is that it can only have one “owner”, so:

Other code can't mutate it, which would presumably invalidate the str pointer. (The docs for the new public API should be very explicit about how long str is valid.)
One could expect that it's not thread-safe. (If that's the case, public API docs should also be explicit about that.)

It seems to me that PyBytesWriter is close to a generic “growable buffer”. Is there a special reason that it's tied to PyBytes? From the API it seems that Finish could be replaced by passing str to PyBytes_FromStringAndSize + calling Discard -- and then you could do the same to make a bytearray or any other type.

The private API has an optimization that we lose here: it puts a small initial buffer on the stack. It seems that performance-minded users will want to use their own implementation anyway. Also, C++, Rust, etc. surely have a growable buffers already; they should be fine with PyBytes_FromStringAndSize

In CPython itself, there are quite few uses of _PyBytesWriter. Is it worth exposing at all?

vstinner commented 1 month ago

@encukou:

For the second, why not use bytearray, which already is explicitly mutable?

It's slower since you have to copy bytearray memory to create a bytes object, and destroy the bytearray, to create the final bytes object. PyBytesWriter works in the nanoseconds area.

One could expect that it's not thread-safe.

Correct, the code is not thread safe, and expected to have a single user.

It seems to me that PyBytesWriter is close to a generic “growable buffer”. Is there a special reason that it's tied to PyBytes? From the API it seems that Finish could be replaced by passing str to PyBytes_FromStringAndSize + calling Discard -- and then you could do the same to make a bytearray or any other type.**

PyBytesWriter contains a temporary bytes object. The purpose of the API is to only expose the bytes object when it's fully initialized.

The private API has an optimization that we lose here: it puts a small initial buffer on the stack.

The small buffer is also used in the proposed public API. It's just that it's allocated on the heap memory.

Also, C++, Rust, etc. surely have a growable buffers already; they should be fine with PyBytes_FromStringAndSize

The problem of using a buffer and then calling PyBytes_FromStringAndSize() is that you have to copy the memory, and then destroy the temporary buffer. Again, PyBytesWriter uses a bytes object internally, so calling PyBytesWriter_Finish() is cheap.

vstinner commented 1 month ago

PyBytesWriter works in the nanoseconds area.

A more concrete micro-benchmark shows that the public PyBytesWriter (23.4 ns) is 4x faster than bytearray (93.0 ns): https://github.com/python/cpython/pull/121726#issuecomment-2273053405.

encukou commented 1 month ago

Thank you, I see now! Your explanations help with reading the implementation.

A hard part of making this public will be documenting what str is, when it is valid and how much of the memory it points to is valid. It's a bit hard to explain :)

vstinner commented 1 month ago

When you call PyBytesWriter_Create(size, &str), you can write size bytes into str.

When you call PyBytesWriter_Prepare(writer, &str, size), you can write size bytes into str.

str is valid until PyBytesWriter_Finish() or PyBytesWriter_Discard() is called. Would you prefer that these functions set str to NULL?

encukou commented 1 month ago

That, and the str you pass to PyBytesWriter_Create or …Finish needs to be the str you got from …Create or …Prepare, incremented by (up to?) the size given to that call. Right?

Would you prefer that these functions set str to NULL?

Probably not necessary, they don't clear writer either.

vstinner commented 1 month ago

That, and the str you pass to PyBytesWriter_Create or …Finish needs to be the str you got from …Create or …Prepare, incremented by (up to?) the size given to that call. Right?

Correct.

encukou commented 3 weeks ago

Except you can call Prepare() several times in succession. I'm not entirely clear on what should happen in that case.

I'd expect that each Prepare reserves size bytes after *str, overriding any previous Prepare. Instead, AFAICS each Prepare adds size bytes to an internal “minimum size” field. IMO, that's harder to explain. But more importantly, I'm not sure on what you should do if you call Prepare with a size that's too big -- is there a good way to “decrement” that if it turns out you needed fewer bytes than you Prepared? Or are users expected to provide exact sizes (which might be expensive to calculate?) The current uses seems to rely on API that's not exposed here, like writer.min_size -= 2.

erlend-aasland commented 3 weeks ago

I find the Prepare API confusing. From the API names in the OP, it is not entirely obvious to me how I'm supposed to use the proposed APIs.

vstinner commented 1 week ago

I find the Prepare API confusing. From the API names in the OP, it is not entirely obvious to me how I'm supposed to use the proposed APIs.

Oh. Can you propose a better API in this case?

Prepare(n) adds n bytes to the internal buffer. Example:

PyBytesWriter_Create(3) allocates a buffer of 3 bytes
PyBytesWriter_Prepare(&writer, &str, 3) reallocates the buffer to 6 bytes
PyBytesWriter_Prepare(&writer, &str, 4) reallocates the buffer to 10 bytes

corona10 commented 6 days ago

For naming, Why not simply

PyBytesWriter_Create -> PyBytesWriter_Init
PyBytesWriter_Prepare -> PyBytesWriter_Allocate

It will be the less confusing :)

encukou commented 6 days ago

For some brainstorming --

I would expect that Prepare looks at the current position, i.e. str, like this:

Prepare(n) clears any previous reservations, and ensures n bytes can be written after the current str:

PyBytesWriter_Create(3) allocates a buffer of 3 bytes
str += 3
PyBytesWriter_Prepare(&writer, &str, 3) reallocates the buffer to 6 bytes
str += 2 // str is now (<start of buffer> + 5)
PyBytesWriter_Prepare(&writer, &str, 4) reallocates the buffer to 9 bytes
PyBytesWriter_Prepare(&writer, &str, 2) could reallocate the buffer to 7 bytes (but in practice we only shrink in _Finish, so it stays at 9)

That said, this is still a weird API; PyUnicodeWriter, where the “current position” is handled entirely by the writer, is much more understandable.

The proposed API allows user code to keep track of the “current write position” using str (like str += 3); but str is also a pointer that's updated by the writer's realloc calls. I think this dual usage is confusing.

Why not let the user handle the “current write position” on their own, as an offset from “start of buffer” that's handled by the writer, and have for example:

// Create a bytes writer instance.
// Set `*str` to a writable buffer of at least the given `size`.
// `*str` is valid until the next `PyBytesWriter_*` call on the result.
PyBytesWriter* PyBytesWriter_Create(Py_ssize_t size, char **str);

// Reallocate to `size` bytes.
// Set `*str` to a writable buffer of at least the given `size`;
//   `min(<old_size>, size)` bytes are copied to this buffer.
// `*str` is valid until the next `PyBytesWriter_*` call on this `writer`.
// (Detail: this will never shrink the internal buffer, but if it needs to grow,
//  it will not overallocate. The user can choose an overallocation strategy
//  themself.)
int PyBytesWriter_Realloc(PyBytesWriter *writer, Py_ssize_t size, char **str);

// Return the final Python bytes object with the given size,
//   and destroy the writer instance.
// `size` must not be greater than in the previous `_Create`/`_Realloc` call.
PyObject* PyBytesWriter_Finish(PyBytesWriter *writer, Py_ssize_t size);

// Discard the internal bytes buffer and destroy the writer instance.
void PyBytesWriter_Discard(PyBytesWriter *writer);

That would make the convenience helper slightly more complicated, though, as it needs to update the “current write position”:

// Write `data_size` bytes from `data` at the given `*offset`.
// Set `*str` to a writable buffer of at least `*offset + data_size` bytes;
//   `*offset` bytes are copied to this buffer.
// Increment `*offset` by `data_size`.
// Equivalent to:
//  - PyBytesWriter_Realloc(writer, *offset + data_size, str)
//  - memcpy((*str) + offset, data, data_size)
//  - *offset += data_size
// (Detail: unlike PyBytesWriter_Realloc, this *will* overallocate, so
//  creating bytes using `PyBytesWriter_Write` would be amortized `O(N)`.)
void PyBytesWriter_Write(PyBytesWriter *writer, char **str, Py_ssize_t *offset, Py_ssize_t data_size, char *data);

But, looking at the big picture, I see only two use cases:

You know the final size in advance. That only needs:
- PyBytesWriter* PyBytesWriter_Create(Py_ssize_t size, char **str);
- PyObject* PyBytesWriter_Finish(PyBytesWriter *writer);
- void PyBytesWriter_Discard(PyBytesWriter *writer).
You don't know the size in advance, and so PyBytesWriter_Finish would likely need to reallocate to the final size, and copy data. In this case, for the preparation you can use any “generic resizable buffer” implementation; Python doesn't need to provide one for you. Pass the result to PyBytes_FromStringAndSize.

erlend-aasland commented 1 day ago

Petr:

That said, this is still a weird API; PyUnicodeWriter, where the “current position” is handled entirely by the writer, is much more understandable.

I agree. I worry that the complexity of this API will lead to misuse.

Victor [regarding my complaints about the "prepare" API]:

Oh. Can you propose a better API in this case?

I don't know yet. I can try to think up something.

capi-workgroup / decisions

Add PyBytes_Writer() API #39