PyO3 / rust-numpy

PyO3-based Rust bindings of the NumPy C-API
BSD 2-Clause "Simplified" License
1.07k stars 98 forks source link

Add support for arrays of ASCII and Unicode strings #378

Closed adamreichold closed 1 year ago

adamreichold commented 1 year ago

This supports Unicode arrays whose element length is know at compile time.

Closes #141

rachtsingh commented 1 year ago

Worked great! This was a very painless way to parse fixed-size ASCII and Unicode strings into numpy arrays. Thank you for doing all the work that I had aimed to do, it is really appreciated.

One small nit: it took me a while to understand that the PyASCIIArray type was a newtype around [Py_UCS1; N] which is the same as [u8; N]. For some reason I thought it was a specialization of the PyArray type, but it seems fairly different to me (feel free to disagree).

I couldn't think of any better names, but maybe one of:

  1. NPYString and NPYUnicode (with a comment marking those as Py_UCS1 / S or Py_UCS4 / U dtypes)
  2. NPYFixedString and NPYFixedUnicode to make it more explicit that these are fixed-length dtypes?

I thought about PyASCIIArray1, but think that's a bad idea because Array in this crate should refer to numpy arrays, but here we're talking about dtypes. For example, here's a numpy array:

In [5]: x = np.array(['a', 'b', 'c', 'd'], dtype='S1').reshape((2, 2))

In [6]: x
Out[6]:
array([[b'a', b'b'],
       [b'c', b'd']], dtype='|S1')

and I think the precise way to refer to this is a (2, 2) numpy array of dtype S1, so the way to construct that should be explicit in rust-numpy.

Anyway, thanks for all the help!

adamreichold commented 1 year ago

One small nit: it took me a while to understand that the PyASCIIArray type was a newtype around [Py_UCS1; N] which is the same as [u8; N]

That is literally the first sentence of its docs. 😉 But I agree, the names are somewhat misleading as these not related to PyArray at all. I changed them to PyFixedString and PyFixedUnicode to stay within our existing naming scheme but emphasize that these are strings with a fixed capacity.

adamreichold commented 1 year ago

@kngwyu I think this is ready to merge as I have just branched off the release-0.19 maintenance branch. Could you have another look? Thank you!