PyO3 / rust-numpy

PyO3-based Rust bindings of the NumPy C-API
BSD 2-Clause "Simplified" License
1.12k stars 110 forks source link

Discussion: dtype system and integrating record types #254

Closed aldanor closed 2 years ago

aldanor commented 2 years ago

I've been looking at how record types can be integrated in rust-numpy and here's an unsorted collection of thoughts for discussion.

Let's look at Element:

pub unsafe trait Element: Clone + Send {
    const DATA_TYPE: DataType;
    fn is_same_type(dtype: &PyArrayDescr) -> bool;
    fn npy_type() -> NPY_TYPES { ... }
    fn get_dtype(py: Python) -> &PyArrayDescr { ... }
}
aldanor commented 2 years ago

Related: #234, #186.

adamreichold commented 2 years ago

W.r.t. the question on why Element looks it does, I can only answer

DATA_TYPE constant is really only used to check if it's an object or not in 2 places, like this:

for lack of involvement in the other design decisions. Here, we discussed that a simple const TRIVIALLY_COPYABLE: bool; could serve the same purpose but having the enum does not really hurt either. In any case, this being a compile-time constant is certainly nice as it will completely remove one of the two branches from the generated code where this is checked

Maintaining a global static thread-safe registry of registered dtypes (kind of like it's done in pybind11)

I think having a GIL-protected lazily initialized static like in

https://github.com/PyO3/rust-numpy/blob/eb1068b683e5efa55794c1fc6e92157d70ab42b5/src/slice_container.rs#L109-L112

might be preferable to a single global data structure?

Should all places where npy_type() is used split between "simple type, use New" and "user type, use NewFromDescr"?

While the overhead of producing a PyArrayDesc from a suitably synchronized data structure is probably small, I suspect that not having to produce it is still faster. Or does NumPy resort to PyArray_DescrFromType internally in any case?

adamreichold commented 2 years ago

Or does NumPy resort to PyArray_DescrFromType internally in any case?

That seems to be the case indeed:

https://github.com/numpy/numpy/blob/1798a7d463590de6dc0dfdad69735d92288313f3/numpy/core/src/multiarray/ctors.c#L1105

So we can probably replace our calls to PyArray_New. The only remaining difference I see is that we have to go through PY_ARRAY_API which implies an additional acquire load.

In order to implement structured dtypes, we'll inevitably have to resort to proc-macros.

I think it would actually be helpful to ignore this part completely for now and design the necessary types without any macro support first. The proc-macros should essentially only relieve programmers from writing repetitive and error prone code, shouldn't they?

aldanor commented 2 years ago

we discussed that a simple const TRIVIALLY_COPYABLE: bool; could serve the same purpose but having the enum does not really hurt either. In any case, this being a compile-time constant is certainly nice as it will completely remove one of the two branches from the generated code where this is checked

Then my initial guess was correct. I wonder though whether it's actually noticeable, i.e. has anyone tried to measure it? With all the other Python cruft on top, I would strongly suspect there'd be no visible difference at all. I understand where it comes from, it's just there's another implicit logical requirement in the Element trait - and it's that Self::get_dtype(py).get_datatype() == Some(Self::DATA_TYPE) - and this requirement is imposed only (?) to have DATA_TYPE as a compile time thing instead of runtime (where at runtime it's a single load dtype_ptr->type_num).

I think it would actually be helpful to ignore this part completely for now and design the necessary types without any macro support first. The proc-macros should essentially only relieve programmers from writing repetitive and error prone code, shouldn't they?

With record types, it's pretty much a requirement; anything else would be extremely boilerplate-ish and unsafe (with the user having to provide all offsets and descriptors for all fields manually). It might be nice to leave an option to construct descriptors manually at runtime, but I can't see a single real use case for it off top of my head. So, the necessary types and the infrastructure have to be designed in a way that would make it trivial to integrate them into proc macros, from the get go.

Or does NumPy resort to PyArray_DescrFromType internally in any case?

Yes, I think it does (and so does pybind11).

I think having a GIL-protected lazily initialized static like in ... might be preferable to a single global data structure?

Yep - thanks for pointing that out, something like that should be sufficient.

aldanor commented 2 years ago

Just checked it out: if one replaces

https://github.com/PyO3/rust-numpy/blob/eb1068b683e5efa55794c1fc6e92157d70ab42b5/src/array.rs#L850

with something runtime like

        if !T::get_dtype(py).is_object() {

the difference in from_slice runtime for an array of 5 elements is 1ns (20ns vs 21ns).

For comparison, creating arange(1, 5, 1) is 31ns.

So yea... you could say it's noticeable (but only if you create hundreds millions of arrays).

adamreichold commented 2 years ago

and this requirement is imposed only (?) to have DATA_TYPE as a compile time thing instead of runtime

ATM, the DATA_TYPE member is used to generate default implementations for the other methods, i.e. it is a utility to simplify the implementation. If same_type and npy_type are dropped in favour of using get_dtype we could certainly remove remove this member. But I also do not really see a significant cost to having at least a const TRIVIAL: bool; member.

(where at runtime it's a single load dtype_ptr->type_num).

I think if anything has measurable overhead, then it is acquiring the reference to a PyArrayDesc itself, either by accessing PY_ARRAY_API.PyArray_DescrFromType or by accessing a lazily initialized static.

It might be nice to leave an option to construct descriptors manually at runtime, but I can't see a single real use case for it off top of my head. So, the necessary types and the infrastructure have to be designed in a way that would make it trivial to integrate them into proc macros, from the get go.

proc-macros generate code that is injected into the dependent crate, i.e. they have to use this crate's public API and all the code they produce must have been possible to be written by hand. The point is not that I expect user code to manually produce those trait implementations (but avoiding the build time overhead of proc-macros would be one reason to do so), but that we can focus the discussion on the mechanisms and leave the proc-macro UI for later.

adamreichold commented 2 years ago

So yea... you could say it's noticeable (but only if you create hundreds millions of arrays).

One possible caveat with a targeted benchmark here is that deciding this at compile time is about removing code and thereby reducing instruction cache pressure which could only be noticeable when other code shares the address space.

adamreichold commented 2 years ago

As a first actionable result of your findings, why not start with a PR to simplify the Element trait to avoid having same_type and npy_type and directly producing a descriptor via PyArrayDescr::from_npy_type? I think that would certainly be a nice simplification. Personally, I would like this combined with a const TRIVIAL: bool flag, but I understand that I am most likely not the only one with an opinion on this.

adamreichold commented 2 years ago

For object types, the current suggestion in the docs is to implement a wrapper type and then impl Element for it manually. This seems largely redundant, given that the DATA_TYPE will always be Object.

I fear that it might actually be unsound: During extraction, we only check the data type to be object, but not whether all the elements are of type Py<T> for the givenT: PyClass. So e.g. .readonly().as_array() would enable safe code to call methods on T even though completely different - possibly heterogeneously typed - objects are stored in the array.

I fear we need to remove this suggestion and limit ourselves to Py<PyAny> as the only reasonable Element implementation for now. I also wonder how checking for homogeneity during extraction is best handled if we do want to properly support PyArray<Py<T>, D>.

aldanor commented 2 years ago

One possible caveat with a targeted benchmark here is that deciding this at compile time is about removing code and thereby reducing instruction cache pressure which could only be noticeable when other code shares the address space.

While I'm often guilty in micro-optimizing Rust code down to every single branch hint myself, I just don't think it matters here, the last thing you're going to want to be doing in numpy bindings is instantiate millions of array objects in a loop - that was kind of my point.

Here's a few more thoughts on what would need to happen if const DATA_TYPE remains in place:

Oh yea, and also, record types can be nested - that would be fun to implement as const as well (at runtime we can just box it all on the heap).

aldanor commented 2 years ago

The note above only applies if it remains as DATA_TYPE, of course.

If it's changed to IS_POD: bool, then DataType can become a complex non-const type (to host the record dtype descriptor), etc.

aldanor commented 2 years ago

all the elements are of type Py<T> for the givenT: PyClass. So e.g. .readonly().as_array() would enable safe code to call methods on T even though completely different - possibly heterogeneously typed - objects are stored in the array.

I fear we need to remove this suggest and limit ourselves to Py<PyAny> as the only reasonable Element implementation for now. I also wonder how checking for homogeneity during extraction is best handled if we do want to properly support PyArray<Py<T>, D>.

Yea, I was asking myself whether I was reading that code correctly - it only checks for Object dtype but not the type of objects themselves.

There's three options:

aldanor commented 2 years ago

As a first actionable result of your findings, why not start with a PR to simplify the Element trait to avoid having same_type and npy_type and directly producing a descriptor via PyArrayDescr::from_npy_type? I think that would certainly be a nice simplification. Personally, I would like this combined with a const TRIVIAL: bool flag, but I understand that I am most likely not the only one with an opinion on this.

Yes, I've already started on this, as a first step, will try to push something shortly. There's actually a few more subtle dtype-related bugs I've uncovered in the process that I'll fix as well (the tests are very sparse so I guess noone noticed anything so far...).

aldanor commented 2 years ago

(This is a bit of a wall of text, so thanks in advance to whoever reads through it in its entirety. I tried to keep it as organized as possible.)

tagging @adamreichold @kngwyu @davidhewitt

Ok, now that step 1 (#256) is ready to be merged, let's think about next steps. I'll use pybind11 as reference here since that's what I'm most familiar with having implemented a big chunk of that myself in the past.

Descriptor (format) strings and PEP 3118

In pybind11, dtypes can be constructed from buffer descriptor strings (see PEP 3118 for details). It actually calls back to NumPy's Python API to do that (numpy.core._internal._dtype_from_pep3118), but it's possible to reimplement it manually if we want to - I would actually learn towards the latter, provided we borrow and verify all the related tests from numpy itself.

The benefit of using descriptor strings - they're easy to hash if a registry is used, there's no nesting, etc - it's just a simple string that can cover any dtype possible, including scalar types, object types, multi-dimensionals subarrays, recarrays, and any combination of the above. We can also generate them with proc-macros at compile time and make them &'static if need be.

Now, if we go this route, we'll have to delve into "format chars", "type kinds", "buffer descriptor strings" etc. There's obviously a big overlap with pyo3::buffer module as there's ElementType::from_format and all the stuff around it, but it won't be sufficient without some reworking. Which raises the question: do we want to copy/paste stuff in pyo3::buffer and reimplement it to suit our needs? Or do we want to make it pybind11-like so that numpy arrays and buffers are tightly bound together, sharing the same infrastructure? See the next section for more details.

Buffers, pyo3::buffer and rust-numpy

In pybind11, buffers and arrays are bound together (like they should be), i.e. you can instantiate one from another (both ways) while controlling who owns the data. There's no such thing in rust-numpy and I think it's a pretty big gap.

There's pyo3::buffer that we can try to integrate with, but, off top of my head, there's some issues with it. But before we continue, consider an example:

>>> dt = np.dtype([('x', '<i8'), ('y', '<U1'), ('z', 'O', (1, 2))])

>>> arr = np.rec.fromarrays(
...     [[1, 2], ['a', 'b'], [[[int, str]], [[float, list]]]], 
...     dtype=dt,
... )

>>> arr
rec.array([(1, 'a', [[<class 'int'>, <class 'str'>]]),
           (2, 'b', [[<class 'float'>, <class 'list'>]])],
          dtype=[('x', '<i8'), ('y', '<U1'), ('z', 'O', (1, 2))])

>>> arr.data.format
'T{=q:x:@1w:y:(1,2)O:z:}'

Here, arr.data is the matching Python buffer (which you can also get via memoryview(arr)). Normally, you can do np.array(arr.data) and get the identical array back, but funnily enough, this exact example triggers the padding bug I have already reported in numpy upstream 6 years ago (numpy/numpy#7797) and which was then patched manually in pybind11 by hand.

Back to pyo3::buffer:

pub enum ElementType {
    SignedInteger { bytes: usize },
    UnsignedInteger { bytes: usize },
    Bool,
    Float { bytes: usize },
    Unknown,
}

Here's what's wrong with this enum:

It's basically a distant relative of DataType that we have just removed from rust-numpy in favor of a single dtype pointer. With these limitations, there's no way we can make proper use of pyo3::buffer and we'll end up duplicating lots of its logic in rust-numpy.

And here's the trait:

pub unsafe trait Element: Copy {
    fn is_compatible_format(format: &CStr) -> bool;
}

Note that this is similar to numpy::dtype::Element and is_compatible_format is equivalent to same_type() which we have just cut out. And again, there's some issues with it:

One thing that could be done to improve it would be this:

pub unsafe trait Element: Clone + Send {
    const IS_COPY: bool;
    fn format() -> &'static CStr;
}

fn can_convert(from: &CStr, to: &CStr) -> bool { ... } // or whichever other way it's done

And then ElementType can be removed. Alternatively, it can be replaced with FormatString(&CStr) which would then be returned by Element::format().

Note how close it is now to the newly reworked numpy::Element - the only difference being in nul-terminated string pointer being returned instead of the dtype pointer.

Note also that any valid T: pyo3::buffer::Element is (logically speaking) also a valid numpy::dtype::Element since dtype ptr can always be built from a valid format string (the inverse is not always true, e.g. due to M and m types; but in most cases it can be worked around if need be).

What to do next

Should the buffers be touched up in pyo3 first, so we can then use that in rust-numpy to add support for record types, subarrays, etc? In which case, what exactly should be done?

One interesting point for discussion may be - could we share some of the functionality between buffers and numpy arrays? (in pybind11, array is derived from buffer and array_t is derived from array - which is quite nice and avoid lots of duplication) Just some random thinking, e.g. maybe buffer-like types like the buffer itself or numpy array could deref to some common base type which would provide shared functionality. Or maybe we could somehow try and connect the two Element traits together.

adamreichold commented 2 years ago

Just my first thoughts after reading:

kngwyu commented 2 years ago

Yeah, I really don't like the current disagreement between PyBuffer and rust-numpy. I actually tried to refactor the PyBuffer years before, but I didn't have a good idea about how to merge two parts. So,

(in pybind11, array is derived from buffer and array_t is derived from array - which is quite nice and avoid lots of duplication) Just some random thinking, e.g. maybe buffer-like types like the buffer itself or numpy array could deref to some common base type which would provide shared functionality. Or maybe we could somehow try and connect the two Element traits together.

I don't think using Deref for resembling OOP is not a recommended way, so I prefer the abstraction by trait.

adamreichold commented 2 years ago

but can we say that ElementType is strongly typed? As @aldanor points out, it allows some invalid configs, which I really don't like. Anyway some improvements are required here.

One could maybe say it is strongly wrongly typed. ;-) More seriously though, I just meant to say that it would be nice to represent the descriptors within the type system directly which we will probably do anyway if do not consistently delegate their interpretation to Python code.

aldanor commented 2 years ago

Ok then. I think I'll try to sketch type descriptor system (focused on buffers to start with, not numpy, as the only real difference would be datetime/timedelta types) in a separate temporary crate and then share it here for further discussion if it works out.

Here's a few random points, assuming there's some magic TypeDescriptor<T> where T is the "leaf type" (i.e. enum with a bunch of type primitives):

kngwyu commented 2 years ago

Sorry but I still don't understand what kind of recursiveness/hierarchy should be allowed to support record types... I'll come back to discussion after reading some PEPs 😓

adamreichold commented 2 years ago

Because of the above, it needs to be possible to instantiate the type descriptor dynamically as well, not only statically.

I see two options from the top of my head: Using arena allocation if recursion is always handled by references. Or using a Cow-like type for static references or boxes. Admittedly the ergonomics of neither approach are particularly nice as Rust is highly explicit about memory management as usual... (If we are out for efficiency Sob could play tricks with the lower bits of the pointer to indicate whether it is owned or static, so there could be no space overhead at all.)

aldanor commented 2 years ago

@kngwyu See below in this sketch where type recursion is (sub-array types - an array may contain any type as element type, and element type may be another subarray).

@adamreichold Yes, that's what I've already implemented (BoxCow). I wouldn't say the ergonomics are "not nice", it's pretty much like a normal Cow, and it all derefs to the base type when you use it so you don't really care about it much except for the cases when you need to clone/convert it.

pub struct FieldDescriptor<T: ValueType> {
    ty: TypeDescriptor<T>,
    name: Option<Cow<'static, str>>,
    offset: usize,
}

pub struct RecordDescriptor<T: ValueType> {
    fields: Cow<'static, [FieldDescriptor<T>]>,
    itemsize: usize,
}

pub struct ArrayDescriptor<T: ValueType> {
    ty: BoxCow<'static, TypeDescriptor<T>>,
    shape: Cow<'static, [usize]>,
}

pub enum TypeDescriptor<T: ValueType> {
    Object,
    Value(T),
    Record(RecordDescriptor<T>),
    Array(ArrayDescriptor<T>),
}
aldanor commented 2 years ago

Ok, I have started sketching a few things out, please feel free to browse around (although it's obviously a very early WIP sketch, I figured I'd better share it asap so as not to spend too much time on something that will be thrown away):

What's implemented so far:

I'm currently pondering on how to link the two Element traits as discussed above in this thread, see numpy-type-desc/src/element.rs, but I'm facing some design problems, especially with the lack of min const generics since our msrv is lower than 1.51... Maybe the design can be changed so it works out - any ideas are welcome.

What I've tried to do:

// pyo3
pub enum Scalar {
    // all the PEP 3118 types: bools, ints, floats, complex, char strings, etc
}
pub unsafe trait Element: Clone + Send {
    const TYPE: TypeDescriptor<Scalar>;
    // this Scalar covers the most of PEP 3118 type system
}

// numpy
pub enum Scalar {
    Base(pyo3_type_desc::Scalar),
    Datetime,  // 'M'
    Timedelta, // 'm'
}
pub unsafe trait Element: Clone + Send {
    const TYPE: TypeDescriptor<Scalar>;
    // this Scalar is either the PEP 3118 or M/m types
}

The problem is in implementing a blanket impl like this:

impl<T: pyo3_type_desc::Element> Element for T {
    const TYPE: TypeDescriptor<Scalar> = ???;
    // How do we convert TypeDescriptor<A> to TypeDescriptor<B> here?
    // See TypeDescriptor::map() - but it would not work in const context :(
}

Having a const type descriptor is super nice, but maybe for numpy::Element it's impossible? (in which case you could of course have a type_descriptor() -> TypeDescriptor<Scalar>. Or maybe I'm just stuck on a wrong design and this could be done completely differently?

Anyways, any thoughts welcome. I have a feeling we can do it 😺

aldanor commented 2 years ago

Just for completeness, the hack/solution I've mentioned in the last paragraph is basically

// numpy
pub trait Element: Clone + Send {
    fn type_descriptor() -> TypeDescriptor<Scalar>;
}
impl<T: pyo3_type_desc::Element> Element for T {
    fn type_descriptor() -> TypeDescriptor<Scalar> {
        T::TYPE.map(|&scalar| Scalar::Base(scalar))
    }
}

So, type_descriptor() technically stops being a const and can no longer be used in const context (which is not nice), but, in practice, it will be pretty much always evaluated as a const and the whole thing is going to be inlined; there's no overhead here. Also, TypeDescriptor<Scalar> will be still constructible in const contexts (e.g. in proc-macros), so for generated code you could always have e.g.

unsafe impl Element for MyType {
    fn type_descriptor() -> TypeDescriptor<Scalar> {
        const TYPE: TypeDescriptor<Scalar> = ...;
        TYPE
    }
}
aldanor commented 2 years ago

And here's another problem with the above system :(

// pyo3
unsafe impl<T: Element> Element for (T,) {
    const TYPE: TypeDescriptor<Scalar> = T::TYPE;
}
// numpy
unsafe impl<T: Element> Element for (T,) {
    fn type_descriptor() -> TypeDescriptor<Scalar> {
        T::type_descriptor()
    }
}
error[E0119]: conflicting implementations of trait `element::Element` for type `(_,)`
  --> numpy-type-desc/src/element.rs:36:1
   |
30 | unsafe impl<T: pyo3_type_desc::Element> Element for T {
   | ----------------------------------------------------- first implementation here
...
36 | unsafe impl<T: Element> Element for (T,) {
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ conflicting implementation for `(_,)`

So now I'm even more unsure of how to proceed.

adamreichold commented 2 years ago

It just my first so it might be completely bogus: Could we make Element generic over Scalar, e.g

pub unsafe trait Element<S>: Clone + Send {
    const TYPE: TypeDescriptor<S>;
}

so that there is only impl for tuples like

unsafe impl<T: Element, S> Element<S> for (T,) {

and rust-numpy would not define another Element trait at all?

I am not sure how a trait bound for S would like though. Or as written in the beginning whether this works at all. Actually, I am mainly trying to say that I am thinking about it but didn't find the time to actually work with the linked repository yet. Sorry.

aldanor commented 2 years ago

And testing a few things out further, in order to e.g. map generic tuples to type descriptors, const TYPE may be not sufficient (at least as of today), as we would need to compute offsets. E.g., as an example:

use memoffset::*;

fn main() {
    const O: usize = offset_of_tuple!((u8, i32, u8, i32), 0);
    dbg!(offset_of_tuple!((u8, i32, u8, i32), 0));
    dbg!(offset_of_tuple!((u8, i32, u8, i32), 1));
    dbg!(offset_of_tuple!((u8, i32, u8, i32), 2));
    dbg!(offset_of_tuple!((u8, i32, u8, i32), 3));
}

outputs

[src/main.rs:4] offset_of_tuple!((u8, i32, u8, i32), 0) = 4
[src/main.rs:5] offset_of_tuple!((u8, i32, u8, i32), 1) = 0
[src/main.rs:6] offset_of_tuple!((u8, i32, u8, i32), 2) = 5
[src/main.rs:7] offset_of_tuple!((u8, i32, u8, i32), 3) = 8

So while all this will get probably inlined to a compile-time constant in the end, we can't force it into a const context (which implies it should probably be a type_descriptor() trait method).

aldanor commented 2 years ago

@adamreichold Thanks for the input - I had this idea in the beginning but somewhy it was discarded and forgotten. Any ideas are really welcome - I think I can handle most of the technical numpy/rust details and quirks myself, but brainstorming design decisions together is exponentially more efficient.

The Scalar (as in, the type parameter in TypeDescriptor<>) is already bound by a trait (maybe there will be more methods, but it seems sufficient for now):

pub trait ScalarDescriptor: 'static + Clone {
    fn itemsize(&self) -> usize;
    fn alignment(&self) -> usize;
}

I guess we could further restrict it by From<pyo3_type_desc::Scalar> (and maybe even Into<Option<pyo3_type_desc::Scalar>> although that I'm unsure about), because 'base' scalars should always remain valid. Also, really, 'static + Clone might be just Copy (it's over-constraining it but may simplify a few things).

aldanor commented 2 years ago

@adamreichold So, from the first glance, this actually seems to work, see below (need to dig around it further to confirm).

The magic key here is the From<Scalar>:

// pyo3

pub trait ScalarDescriptor: 'static + Clone + From<Scalar> {
    fn itemsize(&self) -> usize;
    fn alignment(&self) -> usize;
}
pub unsafe trait Element<S: ScalarDescriptor>: Clone + Send {
    fn type_descriptor() -> TypeDescriptor<S>;
}
#[derive(Clone, Copy, PartialEq, Eq, Hash)]
pub enum Scalar { ... }

// implement `Element` for built-in int/float types etc like so:
macro_rules! impl_element {
    ($ty:ty, $expr:expr) => {
        unsafe impl<S: ScalarDescriptor> Element<S> for $ty {
            #[inline]
            fn type_descriptor() -> TypeDescriptor<S> {
                const TYPE: Scalar = $expr;
                debug_assert_eq!(std::mem::size_of::<$ty>(), TYPE.itemsize());
                TypeDescriptor::Scalar(S::from(TYPE))
            }
        }
    };
    ...
}

// <--- here (in pyo3) is where we can implement generic logic for arrays/tuples etc
// (proc-macro for record types can also be generic and numpy-independent)

// numpy

#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub enum Scalar {
    Base(BaseScalar),
    Datetime,
    Timedelta,
}
impl From<BaseScalar> for Scalar {
    fn from(scalar: BaseScalar) -> Self {
        Self::Base(scalar)
    }
}
impl ScalarDescriptor for Scalar { ... }
aldanor commented 2 years ago

To think about, alternatively we can not require From<Scalar> in ScalarDescriptor but only add it as a bound when defining those impls for basic builtin types. Hmm.

adamreichold commented 2 years ago

So while all this will get probably inlined to a compile-time constant in the end, we can't force it into a const context (which implies it should probably be a type_descriptor() trait method).

I think this is sufficient as long as we are reasonably confident that the code is transparent to the Rust compiler, e.g. not crossing any FFI boundaries which would require cross-language LTO.

To think about, alternatively we can not require From in ScalarDescriptor but only add it as a bound when defining those impls for basic builtin types. Hmm.

From just reading it, this seems preferable as the bound on the trait itself could never be avoided. I guess it will end up being equally powerful though as nobdoy outside of pyo3 could implement Element for the primitive and thereby always foreign types due to the coherence rules?

aldanor commented 2 years ago

One thing I realized I don't fully understand is dtype.alignment and where it matters, in record types.

E.g., consider this dtype:

args = {
    'names':['a', 'b'], 
    'formats': [
        'i1', 
        (
            {
                'names': ['f0', 'f1'], 
                'formats':['<i2', '<f4'], 
                'offsets': [0, 4], 
                'itemsize': 8
             },
            (2,)
        )
    ], 
    'offsets': [0, 4], 
    'itemsize': 20,
}

# this is equivalent to calling:
# np.dtype([('a', '|i1'), ('b', [('f0', '<i2'), ('f1', '<f4')], 2)], align=True)

And now a few examples:

d1 = np.dtype(args)
d2 = np.dtype(args, align=True)

print(d1)
print(d2)

print(d1 == d2, d1.descr == d2.descr)
print(d1.isalignedstruct, d2.isalignedstruct)
print(d1.alignment, d2.alignment)

which outputs

{'names':['a','b'], 'formats':['i1',({'names':['f0','f1'], 'formats':['<i2','<f4'], 'offsets':[0,4], 'itemsize':8}, (2,))], 'offsets':[0,4], 'itemsize':20}
{'names':['a','b'], 'formats':['i1',({'names':['f0','f1'], 'formats':['<i2','<f4'], 'offsets':[0,4], 'itemsize':8}, (2,))], 'offsets':[0,4], 'itemsize':20, 'aligned':True}

True True
False True
1 4

And if we use those dtypes to stick them into rec types:

r11 = np.dtype([('a', 'i1'), ('b', d1)])
r12 = np.dtype([('a', 'i1'), ('b', d1)], align=True)
r21 = np.dtype([('a', 'i1'), ('b', d2)])
r22 = np.dtype([('a', 'i1'), ('b', d2)], align=True)

print(r11)
print(r12)
print(r21)
print(r22)

print(r11 == r12 == r21, r11 == r22)
print(r11.descr == r12.descr == r21.descr, r11.descr == r22.descr)
print([r.alignment for r in [r11, r12, r21, r22]])
print([r.isalignedstruct for r in [r11, r12, r21, r22]])

which outputs (hmm...)

[('a', 'i1'), ('b', {'names':['a','b'], 'formats':['i1',({'names':['f0','f1'], 'formats':['<i2','<f4'], 'offsets':[0,4], 'itemsize':8}, (2,))], 'offsets':[0,4], 'itemsize':20})]
{'names':['a','b'], 'formats':['i1',{'names':['a','b'], 'formats':['i1',({'names':['f0','f1'], 'formats':['<i2','<f4'], 'offsets':[0,4], 'itemsize':8}, (2,))], 'offsets':[0,4], 'itemsize':20}], 'offsets':[0,1], 'itemsize':21, 'aligned':True}
[('a', 'i1'), ('b', {'names':['a','b'], 'formats':['i1',({'names':['f0','f1'], 'formats':['<i2','<f4'], 'offsets':[0,4], 'itemsize':8}, (2,))], 'offsets':[0,4], 'itemsize':20})]
{'names':['a','b'], 'formats':['i1',{'names':['a','b'], 'formats':['i1',({'names':['f0','f1'], 'formats':['<i2','<f4'], 'offsets':[0,4], 'itemsize':8}, (2,))], 'offsets':[0,4], 'itemsize':20}], 'offsets':[0,4], 'itemsize':24, 'aligned':True}

True False
True False
[1, 1, 1, 4]
[False, True, False, True]
aldanor commented 2 years ago

For scalar types, alignment is trivial. For object types, it's pointer-width, I guess? (size_t) For array types, it's the alignment of the underlying element.

But for record types... I'm a bit lost. Given that we will likely never call np.dtype() constructor ourselves manually, we don't really care about numpy's logic of producing aligned c-like struct types for us from a dict/list/tuple-like description. We will generate those ourselves via proc-macros for structs, but what should we do then? A few options/thoughts:

Again, I'm not exactly sure about how this should be handled, so any ideas welcome.

P.S. this is the last bit left for me to finish the entire TypeDescriptor<Scalar> -> &PyArrayDescr conversion logic which I will be able to push shortly. Then I could probably work on the reverse.

adamreichold commented 2 years ago

We will generate those ourselves via proc-macros for structs, but what should we do then?

I am probably missing something, but isn't calling mem::align_of::<T>() to determine the struct's alignment an option?

aldanor commented 2 years ago

We will generate those ourselves via proc-macros for structs, but what should we do then?

I am probably missing something, but isn't calling mem::align_of::<T>() to determine the struct's alignment an option?

Yea. It's an option for when we're going down the "Rust type -> TypeDescriptor -> numpy dtype" path. In order to enable this, there would definitely need to be an alignment field (optional, I guess? can always default it to 1) in the RecordDescriptor. I can add it and then add some validation in the record dtype building logic (like itemsize is divisible by alignment, fields don't overlap etc).

There's more paths, like buffer format string -> TypeDescriptor -> numpy dtype, but I guess the best we can do there is mimic precisely what numpy is doing (which, from what I'm seeing, is to just leave alignment field as 1). So, there's cases where alignment information will be lost, not by us, but by numpy being stupid and buffer protocol being inconsistent. Again, not sure whether it's relevant, just a note.

One more catch is that PyObject* pointers may be unaligned inside structs, but I guess that's fine?

I

aldanor commented 2 years ago

Another thing that all of this is slowly leading to, and that's one of the most interesting parts - logic of how to treat two type descriptors "equivalent" or "compatible". IIUC, numpy ignores the alignment there. So the alignment information we provide to numpy will be mostly useful if new dtypes are created out of the ones we export (e.g. wrapping them in rec types).

aldanor commented 2 years ago

We will generate those ourselves via proc-macros for structs, but what should we do then?

I am probably missing something, but isn't calling mem::align_of::<T>() to determine the struct's alignment an option?

@adamreichold Ah, yea, I forgot, the confusing part, here's a quote from dtype creation logic in descriptor.c:

    /* Structured arrays get a sticky aligned bit */ 
    if (align) {
        new->flags |= NPY_ALIGNED_STRUCT;
    }

The NPY_ALIGNED_STRUCT flag is what's reflected in dtype.isalignedstruct.

Normally, it will only be set when manually requested via passing dtype(..., align=True). But what should we do?

adamreichold commented 2 years ago

One more catch is that PyObject* pointers may be unaligned inside structs, but I guess that's fine?

I think that is as good/bad as any other repr(packed) struct in Rust, as long the as the value of pointer respects alignment.

Normally, it will only be set when manually requested via passing dtype(..., align=True). But what should we do?

If I understand this somewhat correctly, then I think we should compare the alignment reported by mem::align_of::<T> with the alignment that NumPy considers "natural" for a given struct which I would expect to be the maximum of the alignment of its fields?

aldanor commented 2 years ago

If I understand this somewhat correctly, then I think we should compare the alignment reported by mem::align_of:: with the alignment that NumPy considers "natural" for a given struct which I would expect to be the maximum of the alignment of its fields?

It's not that simple, I think. I'll try to explain how I understand what I know:

So, I guess the question is, in the TypeDescriptor, should we just check whether the struct type is potentially an "aligned struct" and, if so, just set NPY_ALIGNED_STRUCT flag because we can?..

aldanor commented 2 years ago

So far I went with the latter (leaving some todos around) - if a struct layout fits all three criteria, we will just mark it as aligned. One quirk, in 'unaligned' mode, I think numpy always just sets the alignment to 1 (regardless of struct layout). We have a choice there, either set it to 1 or set it to the actual alignment (I went with the latter for now but we can switch it if we want).

Anyways, some good progress - untested but it already compiles: dtype_from_type_descriptor() in here. Lots of testing ahead. Surprisingly small amount of code, too - a few hundred lines of code what in numpy itself takes a few (quite a few) thousands.

aldanor commented 2 years ago

Just reporting on a bit more progress: started writing tests (everything works so far), and, as an experiment, implemented Element<S> for arbitrary Rust tuples (impl here). Rust tuples don't have a guaranteed layout, obviously, but that's not our concern - at least now we can automatically generate a record-dtype descriptor. I've added some tests for tuple dtypes and everything seems to work so far - this confirms that both scalar and record dtypes already seem to work as expected 🎉

A few questions that came up related to arrays/tuples (any thoughts welcome):

davidhewitt commented 2 years ago

Just wanted to say I've finally had a chance to read this thread. There's a huge amount of content here and I think you're both more knowledgeable than I on this topic. Thank you so much for working on it 👍

Regarding the PyO3 parts - yes I would definitely welcome some love to the PyBuffer code! And we already have some implementations for const generics which automatically get enabled on new enough compilers - see https://github.com/PyO3/pyo3/blob/91648b23159acbf6b44d1245060efa86cfbdf73f/src/conversions/array.rs#L3

If there's anything you need specifically from me, please ping and I'll do my best to comment. My laptop is currently OOO so I'm quite behind on more complex tasks, am limited to when my family doesn't need me and I can be at a desktop. Please have my blessing to be free to design this as appropriate; I'll try to drop in to read in future again (hopefully it's an easier read second time around 😂).

aldanor commented 2 years ago

@davidhewitt Cheers & no worries, thanks for taking time to skim through this, there's quite a lot indeed :) There's no rush since there's still quite a lot left to do and to test, but it's gradually converging.

For now, I'm working on a shared pyo3/numpy type descriptor prototype in here. The code in src/ has nothing to do with numpy in particular and should be probably at some point get integrated into the buffer part of pyo3 if everyone's happy with it. When the core numpy part is done, I'll take a look separately at what else with can do with PyBuffer (other than linking the two Element traits which has already been done), and whether there are any parts we can deduplicate that don't ndarray or dtypes.

Re: const generics, yes, I've started implementing them in exactly the same way (and even named the feature identically). Should be all good then.

adamreichold commented 2 years ago

(Just wanted to say that I know that I have unread stuff here since 2021-01-15. I just have not yet found the time to go through it.)

aldanor commented 2 years ago

A little bit of a progress report as I found a bit of time to work on it today (all of it can be seen in the recent commits in the repo, link above):

Next thing, I'll probably work a bit on converting dtype -> type descriptor (the other direction). Also maybe sketch the #[numpy] proc macro for generating record dtype descriptors for custom user structs. And there's still a ton of todos in the code that need to be resolved.

aldanor commented 2 years ago

Some more progress - due to datetime64 and timedelta64 support being previously requested, and them being the only two types that diverge from the stdlib buffer format, I think it makes sense to add proper support for them. There's multiple ways to do it, I've sketched one of them here.

In a nutshell, you can just use Datetime64<units::Nanosecond> and it will translate to datetime64[ns] automatically. We can add a little bit of functionality later on, like basic arithmetics for datetime/timedelta within linear time units, maybe conversions to stdlib and/or time crate types, etc.

aldanor commented 2 years ago

Some more progress updates (this is kinda where all of this has been heading from the start) - here's one of the most recent tests.

This compiles and runs:

#[derive(Clone, numpy::Record)]
#[repr(C)]
struct Foo<T, const N: usize> {
    x: [[T; N]; 3],
    y: (PyObject, Timedelta64<units::Nanosecond>),
}

assert_eq!(
    Foo::<i64, 2>::type_descriptor(),
    td!({"x":0 => [(3, 2); <i64], "y":48 => {0 => O, 8 => m8[ns] [16, 8]} [64, 8]})
);

There's still lots of edge cases and bugs to be squashed and tests to be written etc, but on the surface, the core of it seems to be working quite nicely.

adamreichold commented 2 years ago

So, I guess the question is, in the TypeDescriptor, should we just check whether the struct type is potentially an "aligned struct" and, if so, just set NPY_ALIGNED_STRUCT flag because we can?..

So far I went with the latter (leaving some todos around) - if a struct layout fits all three criteria, we will just mark it as aligned.

If the semantics of the same type definition with and without the flag are different on the Python side, maybe we should give the user access to it, e.g. make it a parameter to dtype_from_type_descriptor()? Or do we need it is part of TypeDescriptor when converting a dtype into it?

Do we want to implement Element<S> for [T; N] where T: Element<S>? That would be very logical and nice. Here's a few immediate issues though:

And we already have some implementations for const generics which automatically get enabled on new enough compilers

I think we should definitely add this and do it in the same way as std or PyO3: Add impls up to length 32 using macros on older compilers and use const generics on newer ones dropping the small array macros when our MSRV includes support for const generics.

like basic arithmetics for datetime/timedelta within linear time units, maybe conversions to stdlib and/or time crate types, etc.

Personally, I would very much like this crate to provide only a minimal and efficient conversion API without providing any operations on the wrapper types itself. (I think this also applies to arrays itself and we should aim to focus the API to getting from/to ndarray and nalgebra types.)

aldanor commented 2 years ago

Personally, I would very much like this crate to provide only a minimal and efficient conversion API without providing any operations on the wrapper types itself. (I think this also applies to arrays itself and we should aim to focus the API to getting from/to ndarray and nalgebra types.)

Yea, I had the same feeling. For now I've just added transparent wrapper types themselves that can only do two things (aside from being registered in the dtype system) - be constructed from i64 and be converted into i64; the rest is up to whoever's using them.

I think we should definitely add this and do it in the same way as std or PyO3: Add impls up to length 32 using macros on older compilers and use const generics on newer ones dropping the small array macros when our MSRV includes support for const generics.

I've added the min-const-generics part of it, but I can also add the macro-based impl for up-to-length-32 arrays for older compiler versions. Personally, I'm not a huge fan of min_const_generics-like features all over the place, as I think it's cleaner to add a temporary dependency (until MSRV is high enough) and do it like so:

#[rustversion::since(1.51)]
// impl via min const geneics

#[rustversion::before(1.51)]
// impl via macros

The main benefit is that we won't break some dependent crates the day we kill the min_const_generics feature, as all of this logic stays completely internal to our crates and can be viewed as an implementational detail.

If the semantics of the same type definition with and without the flag are different on the Python side, maybe we should give the user access to it, e.g. make it a parameter to dtype_from_type_descriptor()? Or do we need it is part of TypeDescriptor when converting a dtype into it?

I think, first of all, we need to categorise how a type descriptor can be attached to a type. There will be two ways:

  1. #[derive(numpy::Record)] - this is probably what would be used 99% of time.
  2. Same as above but for remote types (see serde docs for remote/foreign types as an example, we might copy that)
  3. unsafe impl manually. Not sure why you'd want to do that, but perhaps there's some reason (e.g. you're codegen-ing something).

There's an alignment: Option<usize> field in RecordDescriptor which we allow to be set to None - this should cover case 3 as you're in control of that yourself. In cases 1 and 2 the derive macro would automatically set it to Some(std::mem::align_of::<T>()). So perhaps, if you really want a numpy dtype with align = False, we can allow something like a #[record(align = false)] attribute in the derive macro - this would cover cases 1 and 2. All that's left then is to ensure NPY_ALIGNED_STRUCT is not set when alignment.is_none()?

adamreichold commented 2 years ago

Personally, I'm not a huge fan of min_const_generics-like features all over the place, as I think it's cleaner to add a temporary dependency (until MSRV is high enough) and do it like so:

Yeah, I think using a crate like rustversion instead of a public feature is fine.

There's an alignment: Option field in RecordDescriptor which we allow to be set to None

I guess this where I am not sure ATM. Reading the NumPy C API, I get the impression that the idea is that PyArray_Descr::alignment (that one stored in fields or parent) should always be set to the necessary alignment even if the field offset does not respect that. Which is why I am not sure whether tying alignment == None to NPY_ALIGNED_STRUCT makes sense (or actually whether alignment should be an Option at all).

I think we should always have an alignment (even if trivial, i.e. one) and we should set the NPY_ALIGNED_STRUCT field whenever the contents of aRecordDescriptor are indeed aligned (by checking the alignment of the field type descriptors against the offsets of the fields).

My understanding of the align argument to the dtype constructor is at this moment basically like "here is a record type but I can't be bothered to work out the correct offsets so please do the padding for me" which would be a courtesy that TypeDescriptor just does not need to provide.

aldanor commented 2 years ago

I guess this where I am not sure ATM. Reading the NumPy C API, I get the impression that the idea is that PyArray_Descr::alignment (that one stored in fields or parent) should always be set to the necessary alignment even if the field offset does not respect that. Which is why I am not sure whether tying alignment == None to NPY_ALIGNED_STRUCT makes sense (or actually whether alignment should be an Option at all).

The way it works in numpy itself (_convert_from_dict),

PyArray_Descr *new = PyArray_DescrNewFromType(NPY_VOID); // <-- new->alignment := 1
...
if (align) { // <-- align comes from either align=True or from being called recursively
    new->alignment = maxalign; // <-- maxalign is computed C-alignment if offsets are not provided
}
...    
/* Structured arrays get a sticky aligned bit */
if (align) {
    new->flags |= NPY_ALIGNED_STRUCT;
}

IIUC, descr->alignment will be set to something different from 1 only if align=True is passed in explicitly, and in that case NPY_ALIGNED_STRUCT flag will always be set.

There's 4 cases in numpy constructor itself:

So, the following must hold true: "if descr->alignment != 1, then NPY_ALIGNED_STRUCT is set". The reverse, obviously, does not need to hold true

Back to our business - I guess we can do something like this?

struct RecordDescriptor<T> {
    ...
    alignment: usize,  // always set
    is_aligned: Option<bool>,
}

Where is_aligned can be: