IronLanguages / ironpython3

Implementation of Python 3.x for .NET Framework that is built on top of the Dynamic Language Runtime.
Apache License 2.0
2.51k stars 290 forks source link

Allow memoryview of .NET arrays #872

Open slozier opened 4 years ago

slozier commented 4 years ago

Would be nice to allow use of memoryview on .NET arrays, for example:

memoryview(System.Array[int]([1,2,3]))

This was supported with buffer in IronPython 2.

@BCSharp if you have any thoughts about this.

BCSharp commented 4 years ago

I agree it would be nice for interop, just am not sure how it could be practically implemented. memoryview (and Python Buffer in general) works under an assumption that the size/structure of the underlying buffer does not change until the memoryview object wrapping it is released.

Also, it immediately prompts for support of other cases, like ArraySegment or String (the latter RO only). So maybe instead supporting .NET arrays directly, we could support System.Memory? It is immutable in size, supports strings, arrays/array segments of various types, and it is easy to obtain a Memory from an array. The limitation is of course lack of support for multidimensional arrays. I don't know how to work around it.


This was supported with buffer in IronPython 2.

How did it work in IronPython 2?

IronPython 2.7.10 (2.7.10.1000)
[.NETFramework,Version=v4.5 on .NET Framework 4.8.4180.0 (64-bit)]
Type "help", "copyright", "credits" or "license" for more information.
>>> import System
>>> memoryview(System.Array[int]([1,2,3]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: expected IBufferProtocol, got Array[int]
slozier commented 4 years ago

Isn't the size/structure of a .NET array immutable? I would be fine with using System.Memory if it solves the issue.

While String would be interesting to support we would need to be careful not to break things in the future. For example, if we ever come up with a flexible string representation (https://github.com/IronLanguages/ironpython3/issues/252) would memoryview("abc") still be valid?

In IronPython 2 it doesn't work with memoryview but it does with buffer (which is essentially the predecessor of memoryview):

import System
b = buffer(System.Array[int]([1,2,3]))
assert b[1] == 2
BCSharp commented 4 years ago

Isn't the size/structure of a .NET array immutable?

I was thinking about a scenario when a memoryview is constructed around an instance of a .NET array, and then that array is shortened with System.Array.Resize(...). Technically an new array is created and the content copied over, but practically, all existing references get redirected to the new array so it appears as if the array itself were shortened. This doesn't fit the memoryview usage patterns.

If a System.Memory object is created from the array, it will still represent the original array before resize, so it fits perfectly wit the memoryview. We could provide a convenience constructor accepting an array that would simply call AsMemory() under the hood, but again, it makes an exception in the pattern, because the object wrapped by the memoryview would not be the one given to the constructor. I think it would be better for the user to decide if this is what they intend and if so, let them call AsMemory() explicitly.


If we stick with Memory support, then support for strings is implied, as anybody can create a ReadOnlyMemory over a .NET string and pass it to a memoryview constructor. With a flexible string representation, we can implement AsMemory() on those objects any way we see fit. But the question with strings is, what typecode to use for System.Char? H, I, C?... This is BTW also a valid question for arrays of chars.

slozier commented 4 years ago

I'm not quite sure Array.Resize is a valid argument. It doesn't replace all existing references, it only replaces the single reference:

var arr = new int[] {1,2,3};
var arr2 = arr;
Array.Resize(ref arr, 0);
Debug.Assert(arr2.Length == 3);

With IronPython it's even clearer, you would have to explicitly reassign to the variable:

arr = System.Array[int]([1,2,3])
x = System.Array.Resize(arr, 0) # here x is the resized array, arr is unchanged
assert x.Length == 0
assert arr.Length == 3
arr = x

If you're curious, to use it by reference:

a = System.Array[int]([1,2,3])
arr = clr.Reference[System.Array[int]](a)
System.Array.Resize(arr, 0)
assert arr.Value.Length == 0
assert a.Length == 3

For System.Char:

Personally I would have no issue with only supporting the primitive value types and excluding char. Support for it could be added in the future if there is a demand for it.

BCSharp commented 4 years ago

It doesn't replace all existing references, it only replaces the single reference:

My mistake. I expected this behaviour but when testing to confirm it I tested it wrong. Thanks for the elaborate examples (esp. the last one; I didn't know how call a method with a ref argument). I have no further objections against wrapping over an array.


C seems to be most Pythonic, and indeed, I'd expect it to return a single character string. But just because it is most Pythonic, I have my reservations: it is not unthinkable that CPython may at some point introduce C to represent either wchar_t (wchich may be 16 or 32 bit) or a single Python character (rune in .NET speak), which is always 32 bit wide but accepts only the Unicode range, or maybe char32_t... System.Char is always 16 bit so making such assumption may not be forward-compatible. So I would stay away from it.

Both H and I would work but carry an assumption that strings are represented as UTF-16 or UTF-32. Since the flexible string representation is still not implemented, I would not make such assumption at this stage. So this rules out wrapping memoryview over strings. Also Python disallows wrapping memoryview over str.

For simple char[] arrays, I think a numeric H would be OK. But I am also OK with not supporting it yet.

slozier commented 4 years ago

For simple char[] arrays, I think a numeric H would be OK. But I am also OK with not supporting it yet.

Another option might be to use u (which you just removed). It's use in array is basically the same thing as a char array and it returns single character strings on indexing. It's deprecated in CPython so it's unlikely to make a comeback...

BCSharp commented 4 years ago

Another option might be to use u

This actually sounds like a good idea. I too think it is unlikely to come back in CPython.