eiriktsarpalis / typeshape-csharp

Practical generic programming for C#
https://eiriktsarpalis.github.io/typeshape-csharp/
MIT License
138 stars 6 forks source link

Rank is non-zero for non-arrays #32

Open AArnott opened 1 week ago

AArnott commented 1 week ago

The xml doc comment on the Rank property suggests that it is non-zero only for arrays:

https://github.com/eiriktsarpalis/typeshape-csharp/blob/fcbaf23466f745a37ef2e892b3e660afe5f69d43/src/TypeShape/Abstractions/IEnumerableTypeShape.cs#L26-L29

But in fact it is 1 for IEnumerable<T>. There is nothing in the IEnumerableTypeShape to definitely identify an array, I guess. I'm going to try to 'guess' by having a combination of CollectionConstructionStrategy.None and a rank > 0.

AArnott commented 1 week ago

Furthermore, I see that 1-rank arrays take the CollectionConstructionStrategy.Span path, which results in first deserializing the elements into an array, and then the array as a span is the source of a ToArray() call which allocates a new array and copies its contents in. I guess I'll use IEnumerableTypeShape.Type.IsArray to determine whether it's an array to know when to consider Rank and add these optimizations.

eiriktsarpalis commented 1 week ago

The xml doc comment on the Rank property suggests that it is non-zero only for arrays:

That comment is misleading, it is meant to reflect the dimensionality of collection types in general. rank-1 arrays and IEnumerable<T> are single dimensional and therefore of rank 1. Similarly rank-N arrays and the newly introduced Tensor<T> and TensorSpan<T> can be thought of as higher-rank collections.

There is nothing in the IEnumerableTypeShape to definitely identify an array, I guess.

Every ITypeShape instance has a Type property containing its underlying System.Type value, so you can do shape.Type.IsArray to check for that in the usual way.

Furthermore, I see that 1-rank arrays take the CollectionConstructionStrategy.Span path, which results in first deserializing the elements into an array, and then the array as a span is the source of a ToArray() call which allocates a new array and copies its contents in.

Yeah, the problem with array is that you need to be able to know its length ahead of time, which for many wire formats isn't available before having read their entire payload. What I've been doing for the case of CBOR which is optionally length-prefixed is use a PooledList<T> struct that uses resizable pooled intermediate buffers:

https://github.com/eiriktsarpalis/typeshape-csharp/blob/fcbaf23466f745a37ef2e892b3e660afe5f69d43/src/TypeShape.Examples/CborSerializer/Converters/CborEnumerableConverter.cs#L87-L98

Assuming MessagePack is always length-prefixed you could conceivably use a specialized converter for arrays that deserialized to the final buffer directly. Specializing converters is fairly easy within visitors, here's an example of this happening for rank-N arrays in the JSON serializer:

https://github.com/eiriktsarpalis/typeshape-csharp/blob/fcbaf23466f745a37ef2e892b3e660afe5f69d43/src/TypeShape.Examples/JsonSerializer/JsonSerializer.Builder.cs#L85-L89

AArnott commented 1 week ago

you could conceivably use a specialized converter for arrays that deserialized to the final buffer directly. Specializing converters is fairly easy within visitors, here's an example of this happening for rank-N arrays in the JSON serializer:

Yes, that's exactly what I settled on last night, and I used that json sample as inspiration.