faster-cpython / ideas

1.68k stars 48 forks source link

Copy on write *args and **kwargs? #445

Closed Fidget-Spinner closed 1 year ago

Fidget-Spinner commented 2 years ago

See https://github.com/python/cpython/issues/95757.

Should we have an special object to pass arguments? This special object would look like a normal dict/tuple to the end user, but duplicate and overwrite itself on every write/modification.

It should speed up complex calls, but I don't know if the effort will be worth the complexity.

kumaraditya303 commented 2 years ago

How would this preserve this if you use a custom object?

def foo(**kwargs):
    assert type(kwargs) is dict

foo(a=1, b=2)
da-woods commented 2 years ago

At the risk of asking a silly question: how does *args get written to?

mratmartinez commented 2 years ago

At the risk of asking a silly question: how does *args get written to?

In the @kumaraditya303 example, you can't:

>>> def foo(**kwargs):
...     assert type(kwargs) is dict
... 
>>> foo(a=1, b=2)
>>> foo(1, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() takes 0 positional arguments but 2 were given
stuaxo commented 1 year ago

I don't know if del[someitem] counts as a write, but you sometimes see it where kwargs is accepted, then edited before being passed onto another function.

Some examples grabbed by searching github image image

gvanrossum commented 1 year ago

I don't actually understand @Fidget-Spinner's proposal (not even after reading the linked issue :-( ) but deleting something from a dict is considered a write, yes, @stuaxo.

Delengowski commented 1 year ago

I don't actually understand @Fidget-Spinner's proposal (not even after reading the linked issue :-( ) but deleting something from a dict is considered a write, yes, @stuaxo.

Certain languages (such as Matlab) have what's called pass by value, copy on write semantics. What happens is that say you have some data structure you are passing to a function. Within the scope of that function, you get the same exact data structure (e.g. in python calling id on the data structure would return the same thing in both the calling scope and scope of the function), which is the pass by value portion. Upon writing to be data structure within the called function scope, and only then does that data structure get copied in memory and the write takes place - this is the copy on write portion. This sort of thing is really big and I think necessary in functional languages where functions are meant to be pure and side effects free.

I'm presuming @stuaxo doesn't want the called function to be able to manipulation a mutable data structure passed with * nor a dictionary passed with ** and have those manipulation effect the calling scope.

I think... I'm slightly confused by it too. Like what are the semantics if you pass in variable positional arguments or variable keyword arguments piecemeal rather than as a *iterable or **dict?

gvanrossum commented 1 year ago

Ah, now I think I understand @Fidget-Spinner's proposal (and why it's in this tracker). He thinks that we might save the cost of copying *args or **kwds by using a copy-on-write implementation. I'm not sure whether it would have much of an effect, those tuples and dicts are usually not that large, and the shenanigans to make it work in all corner cases and make it transparent (enough) to the user might not be worth it. @Fidget-Spinner am I right?

Fidget-Spinner commented 1 year ago

I forgot where I came to the realisation, but this won't be very useful for CPython due to us already supporting vectorcall.

IMO, the main overhead comes not from the copying, but from having to traverse the args array and INCREF everything, then traverse the array and DECREF everything subsequently. We can't (at least for now) change these semantics to use borrowed references.

Anyways, I had already thought about all the shenanigans to make it work (even with the user modifying things and doing whacky stuff). It's possible, just maybe not worth it.