Open slozier opened 4 years ago
I agree it would be nice for interop, just am not sure how it could be practically implemented. memoryview
(and Python Buffer in general) works under an assumption that the size/structure of the underlying buffer does not change until the memoryview
object wrapping it is released.
Also, it immediately prompts for support of other cases, like ArraySegment
or String
(the latter RO only). So maybe instead supporting .NET arrays directly, we could support System.Memory
? It is immutable in size, supports strings, arrays/array segments of various types, and it is easy to obtain a Memory
from an array. The limitation is of course lack of support for multidimensional arrays. I don't know how to work around it.
This was supported with buffer in IronPython 2.
How did it work in IronPython 2?
IronPython 2.7.10 (2.7.10.1000)
[.NETFramework,Version=v4.5 on .NET Framework 4.8.4180.0 (64-bit)]
Type "help", "copyright", "credits" or "license" for more information.
>>> import System
>>> memoryview(System.Array[int]([1,2,3]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected IBufferProtocol, got Array[int]
Isn't the size/structure of a .NET array immutable? I would be fine with using System.Memory
if it solves the issue.
While String
would be interesting to support we would need to be careful not to break things in the future. For example, if we ever come up with a flexible string representation (https://github.com/IronLanguages/ironpython3/issues/252) would memoryview("abc")
still be valid?
In IronPython 2 it doesn't work with memoryview
but it does with buffer
(which is essentially the predecessor of memoryview):
import System
b = buffer(System.Array[int]([1,2,3]))
assert b[1] == 2
Isn't the size/structure of a .NET array immutable?
I was thinking about a scenario when a memoryview
is constructed around an instance of a .NET array, and then that array is shortened with System.Array.Resize(...)
. Technically an new array is created and the content copied over, but practically, all existing references get redirected to the new array so it appears as if the array itself were shortened. This doesn't fit the memoryview
usage patterns.
If a System.Memory
object is created from the array, it will still represent the original array before resize, so it fits perfectly wit the memoryview
. We could provide a convenience constructor accepting an array that would simply call AsMemory()
under the hood, but again, it makes an exception in the pattern, because the object wrapped by the memoryview
would not be the one given to the constructor. I think it would be better for the user to decide if this is what they intend and if so, let them call AsMemory()
explicitly.
If we stick with Memory
support, then support for strings is implied, as anybody can create a ReadOnlyMemory
over a .NET string and pass it to a memoryview
constructor. With a flexible string representation, we can implement AsMemory()
on those objects any way we see fit. But the question with strings is, what typecode to use for System.Char
? H
, I
, C
?... This is BTW also a valid question for arrays of chars.
I'm not quite sure Array.Resize
is a valid argument. It doesn't replace all existing references, it only replaces the single reference:
var arr = new int[] {1,2,3};
var arr2 = arr;
Array.Resize(ref arr, 0);
Debug.Assert(arr2.Length == 3);
With IronPython it's even clearer, you would have to explicitly reassign to the variable:
arr = System.Array[int]([1,2,3])
x = System.Array.Resize(arr, 0) # here x is the resized array, arr is unchanged
assert x.Length == 0
assert arr.Length == 3
arr = x
If you're curious, to use it by reference:
a = System.Array[int]([1,2,3])
arr = clr.Reference[System.Array[int]](a)
System.Array.Resize(arr, 0)
assert arr.Value.Length == 0
assert a.Length == 3
For System.Char
:
H
: I would expect indexing (e.g. mv[0]
) to return an integer value (which may be ok)I
: this would reflect the size of the char which is probably an issueC
: this one also seems like a valid option, however, would indexing return a single character string or a System.Char
? It seems like the single character string would be more Pythonic?Personally I would have no issue with only supporting the primitive value types and excluding char. Support for it could be added in the future if there is a demand for it.
It doesn't replace all existing references, it only replaces the single reference:
My mistake. I expected this behaviour but when testing to confirm it I tested it wrong. Thanks for the elaborate examples (esp. the last one; I didn't know how call a method with a ref
argument). I have no further objections against wrapping over an array.
C
seems to be most Pythonic, and indeed, I'd expect it to return a single character string. But just because it is most Pythonic, I have my reservations: it is not unthinkable that CPython may at some point introduce C to represent either wchar_t
(wchich may be 16 or 32 bit) or a single Python character (rune in .NET speak), which is always 32 bit wide but accepts only the Unicode range, or maybe char32_t
... System.Char
is always 16 bit so making such assumption may not be forward-compatible. So I would stay away from it.
Both H
and I
would work but carry an assumption that strings are represented as UTF-16 or UTF-32. Since the flexible string representation is still not implemented, I would not make such assumption at this stage. So this rules out wrapping memoryview
over strings. Also Python disallows wrapping memoryview
over str
.
For simple char[]
arrays, I think a numeric H
would be OK. But I am also OK with not supporting it yet.
For simple char[] arrays, I think a numeric H would be OK. But I am also OK with not supporting it yet.
Another option might be to use u
(which you just removed). It's use in array
is basically the same thing as a char array and it returns single character strings on indexing. It's deprecated in CPython so it's unlikely to make a comeback...
Another option might be to use
u
This actually sounds like a good idea. I too think it is unlikely to come back in CPython.
Would be nice to allow use of
memoryview
on .NET arrays, for example:This was supported with
buffer
in IronPython 2.@BCSharp if you have any thoughts about this.