Tensors in .NET - Githubissues

ericstj commented 9 months ago

Goal: Provide a type for use as both exchange and interop that represents multi-dimensional data if a single primitive Type. Implement arithmetic and linear algebra operations so that the type can serve as a sufficient basis for data preparation and as an input and output to neural networks.

[x] Explore memory layouts and how they play into interop
[x] Explore high-level Tensor
design
- Abstract? Type heirarchy? Construction patterns.
- Dense vs sparse
- Slices
- Dimension order
- Expected outcome: design document draft in dotnet/designs
[x] Explore Tensor arithmetic design
[x] Explore Tensor construction design
[x] Explore Tensor loading and saving
TODO: Further break down Tensor implementation into smaller chunks
[x] Tensor
API proposal
- Expected outcome: API review
[x] Tensor
core types implementation
- Expected outcome: PR
TODO: Further break down Tensor implementation into smaller chunks
[ ] Performance tests
[ ] Interop with OnnxRuntime
[ ] Interop with TorchSharp
[ ] Expose in ML.NET

hez2010 commented 9 months ago

I hope we can take #89730 into consider together as it's critical for achieving high performance linear algebra operations on arbitrary ND tensors. Such feature has been a fundamental of libraries like the native implementation part of pytorch.

ghost commented 9 months ago

Tagging subscribers to this area: @dotnet/area-meta See info in area-owners.md if you want to be subscribed.

Issue Details

Goal: Provide a type for use as both exchange and interop that represents multi-dimensional data if a single primitive Type. Implement arithmetic and linear algebra operations so that the type can serve as a sufficient basis for data preparation and as an input and output to neural networks. - [ ] Explore memory layouts and how they play into interop - [ ] Explore high-level Tensor design - Abstract? Type heirarchy? Construction patterns. - Dense vs sparse - Slices - Dimension order - Expected outcome: design document draft in dotnet/designs - [ ] Explore Tensor arithmetic design - [ ] Explore Tensor construction design - [ ] Explore Tensor loading and saving - TODO: Further break down Tensor implementation into smaller chunks - [ ] Tensor API proposal - Expected outcome: API review - [ ] Tensor core types implementation - Expected outcome: PR - TODO: Further break down Tensor implementation into smaller chunks - [ ] Performance tests - [ ] Interop with OnnxRuntime - [ ] Interop with TorchSharp - [ ] Expose in ML.NET

Author:	ericstj
Assignees:	-
Labels:	`Epic`, `area-Meta`, `untriaged`, `needs-area-label`
Milestone:	-

ghost commented 9 months ago

Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.

Issue Details

Goal: Provide a type for use as both exchange and interop that represents multi-dimensional data if a single primitive Type. Implement arithmetic and linear algebra operations so that the type can serve as a sufficient basis for data preparation and as an input and output to neural networks. - [ ] Explore memory layouts and how they play into interop - [ ] Explore high-level Tensor design - Abstract? Type heirarchy? Construction patterns. - Dense vs sparse - Slices - Dimension order - Expected outcome: design document draft in dotnet/designs - [ ] Explore Tensor arithmetic design - [ ] Explore Tensor construction design - [ ] Explore Tensor loading and saving - TODO: Further break down Tensor implementation into smaller chunks - [ ] Tensor API proposal - Expected outcome: API review - [ ] Tensor core types implementation - Expected outcome: PR - TODO: Further break down Tensor implementation into smaller chunks - [ ] Performance tests - [ ] Interop with OnnxRuntime - [ ] Interop with TorchSharp - [ ] Expose in ML.NET

Author:	ericstj
Assignees:	-
Labels:	`Epic`, `area-Meta`, `area-System.Numerics`, `untriaged`, `needs-area-label`
Milestone:	-

ghost commented 9 months ago

Tagging subscribers to this area: @dotnet/area-system-numerics-tensors See info in area-owners.md if you want to be subscribed.

Issue Details

Goal: Provide a type for use as both exchange and interop that represents multi-dimensional data if a single primitive Type. Implement arithmetic and linear algebra operations so that the type can serve as a sufficient basis for data preparation and as an input and output to neural networks. - [ ] Explore memory layouts and how they play into interop - [ ] Explore high-level Tensor design - Abstract? Type heirarchy? Construction patterns. - Dense vs sparse - Slices - Dimension order - Expected outcome: design document draft in dotnet/designs - [ ] Explore Tensor arithmetic design - [ ] Explore Tensor construction design - [ ] Explore Tensor loading and saving - TODO: Further break down Tensor implementation into smaller chunks - [ ] Tensor API proposal - Expected outcome: API review - [ ] Tensor core types implementation - Expected outcome: PR - TODO: Further break down Tensor implementation into smaller chunks - [ ] Performance tests - [ ] Interop with OnnxRuntime - [ ] Interop with TorchSharp - [ ] Expose in ML.NET

Author:	ericstj
Assignees:	-
Labels:	`Epic`, `area-Meta`, `area-System.Numerics.Tensors`, `untriaged`, `needs-area-label`
Milestone:	-

Shuenhoy commented 9 months ago

I found my original comments somehow away from my actual intentions. The key point is not ET, layz evaluation, or any specific techniques. It is the ability of eliminating intermediate arrays (known as deforestation, or fusion https://en.wikipedia.org/wiki/Deforestation_(computer_science)). Users can always use lowlevel primitives to manually archive this. But if the highlevel api can not handle this, its usages will be largely limited.

original comments

For the arithmetic and linear algebra operations, I suggest considering the "expression template"(ET) techniques commonly used in C++ world (eg. eigen3 and blaze), which is a kind of lazy evaluations that can reduce the allocation of temp object. This is helpful especially for a GC language.

For a quick impression, v1 + v2 in ET will produce a VecAdd<Vec, Vec> instead of a Vec. Similarly, m * (v1 + v2) will produce a MatMul<Mat, VecAdd<Vec, Vec>>. The computation is encoded in the type, and will get evaluated when assigned to a concrete Vec type.

Last month, as part of attemp at rebasing my research from C++ to .NET, I made a experiment with ET in .NET via generics: https://github.com/Shuenhoy/DotnetETExp . The (not sufficient but illustrative) results look promising and suggest ET may also be helpful in .NET world.

However, there are serveral barries that prevent me further investigating:

Operator overloading does not support generic parameter (https://github.com/dotnet/csharplang/issues/813). The arithmetic in ET can happen between unlimited types, instead of finite types eg. Vector Matrix.
I then turned to F#, which supports some kinds of generic operators. However, I still cannot have generic parameter on op_Implicit, which means I cannot have the computation evaluated at assignment, but have to manually evaluate it like let out = v1 + v2 |> eval
Generic constraint is not part of overload signature (https://github.com/dotnet/csharplang/discussions/2013). This preven definition of left multiple and right multiple in ET:
```
// ...
member (*.) (left: 'scalar & #INumberBase<'scalar>, right: 'MatExp & #IMatExp<'matExp, 'scalar> ) = // ...
member (*.) (left: 'matExp & #IMatExp<'matExp, 'scalar>, right:  'scalar & #INumberBase<'scalar> ) = // ...
```
There is an workaround to use a wrapper struct like type MatExp<'inner, 'scalar when IMatExp<'inner> and INumberBase<'scalar>>. But then we have to manually do the wrap.
Lacks of const generics, as @hez2010 has mentioned (https://github.com/dotnet/runtime/issues/89730). Const generics have two benefits at least: inline allocate small matrix and prevent some potential bugs. For example, in graphics, it is common to use a Matrix<double, Dynamic, 3> V to store all vertices. Then V.Row(x) must be a expression of dimension 1*3.
Ref struct cannot implement interface (https://github.com/dotnet/csharplang/issues/7608) . As small matrix is allocated inline, their references need to be store in the expresstion type struct like VecAdd to avoid copy.
Lacks of (const) generic specialization. This can unify fix sized and dynamic sized matrix in the form Matrix<Scalar, DimRow, DimCol> and simplify the API.

Some other features like existential types (https://github.com/dotnet/csharplang/issues/5556) may also be useful. But I cannot provide more information for now. In case someone is interested, I have uploaded my attempts with F# here https://github.com/Shuenhoy/Furzn/ .

These feature requrests have been existed for a while and some may require changes in runtime and even metadata. I do not expect they can be all implemented any sooner. But I hope .NET foundation can take a closer look on them for a potentialy better presentation of Tensors and Linear algebra in .NET.

tannergooding commented 9 months ago

Graph optimizations like that are a separate/independent concern and are not something that should be part of the default type experience.

None of the major tensor libraries force such handling. They all allow trivial direct usage and provide a separate way to do lazy evaluation over an expression tree in order to allow dynamic code generation for additional performance.

Shuenhoy commented 9 months ago

Thanks for your reply!

None of the major tensor libraries force such handling

At least almost all common used libraries in C++ world use this, Eigen3, blaze, xtensor, etc.

part of the default type experience.

In most cases, the users should not experience any difference between an ET library and a "direct usage" library. The following code can be both valid ET usage (but with higher perforamce without allocation of intermediate array) and direct usage, with the help of involve of implicit conversion.

VectorXf x = m*(a + b + c); 
var y = m*(a + b + c); // only with type inference will the expression types expose to user

In fact, this is a major advantage of ET. It allows the users to write high performance code as natural as "direct usage" code. We have examples in .NET world about the intermediate array problem that ET solves. TorchSharp has a page: https://github.com/dotnet/TorchSharp/wiki/Memory-Management . The proposed methods are less natural (you have to write using for each tmp obj or use a disposing scope) and less performant (they can only offer deterministic disposing but not eliminate allocation).

I propose ET because it's the most commonly used techniques to the best of my knoweldge. Though this is currently not possible in .NET, as my previous comment. I understand something may be out of the initial scope of the design goal and do not expect this can be solved in current stage. However, immediate array evaluation is definitely a fundamental problem and should be considered. Of course, techniques other than ET can be considered if they can be the same natural to write and performant (probably with the help of JIT?).

tannergooding commented 9 months ago

At least almost all common used libraries in C++ world use this, Eigen3, blaze, xtensor, etc.

I think we have different classifications of "major" here. C++ has several, but they tend to see far less usage than things like PyTorch, NumPy, TensorFlow, Jax, etc

The C++ libraries you called out are notably depending on templating very heavily here and don't really fit into the broader "framework design guidelines" that .NET has for it's APIs.

In most cases, the users should not experience any difference between an ET library and a "direct usage" library.

This itself makes several assumptions including features that the consuming language supports and coding style that developers use in their codebase. Neither are things that we can rely on for something we're shipping from dotnet/runtime.

A good design here is going to end up following the tried and true API design guidelines we have for .NET. It is going to consider how it integrates into the broader .NET ecosystem, how languages like C# and F# will consume it, and will be appropriately layered to correctly balance ease of use, extensibility, versioning, performance, and layering.

I expect that this will ultimately come in the general shape of an ITensor<TSelf, T> interface, a Tensor<T> sealed class, and some kind of ref struct TensorSpan<T>. These will build on top of the already exposed TensorPrimitives APIs as a way of providing highly efficient CPU computation on a per operation basis.

This then gives a solid foundation on which it can be extended to support additional features. For example, it should be possible to design a TensorBuilder like type which could implement ITensor<TSelf, T> and build up an expression tree internally. It should be possible to use some level of source generators or potentially interceptors to achieve similar functionality.

By properly considering the core needs, the layering considerations, and ensuring we can have our tensor types appropriately expose the underlying memory or cheaply wrap other memory with the correct layout, we have a very robust and extensible system that follows the framework design guidelines and doesn't leave anything on the table.

I'm working on the general design doc currently and hope to have more to share in the coming weeks.

RenderMichael commented 9 months ago

A lot of the lazy-evaluation could be handled by returning an ITensor interface instance, and optimizing which lazily-evaluated version of ITensor is returned, LINQ-style. Would that be too much of a performance hit?

Shuenhoy commented 9 months ago

C++ has several, but they tend to see far less usage than things like PyTorch, NumPy, TensorFlow, Jax, etc

I think it's a problem of area. I am not sure this new tensor libaray is ML-specific or tends to be more general. For ML, of course you will see a lot of PyTorch etc. But there are also physical-based simulation, geometry processing, numerical optmization, graphics etc., that also need a tensor/linear algebra libarary and may have different usages. These areas are definitely smaller than ML, so in total there are more usage of PyTorch etc.

Anyway, I proposed ET here only because it's the only technique I am aware of. It will be great if there are other way better suited for .NET to handle it like the source generator and interceptors you mentioned. Looking forward to the design doc!

A lot of the lazy-evaluation could be handled by returning an ITensor interface instance, and optimizing which lazily-evaluated version of ITensor is returned, LINQ-style. Would that be too much of a performance hit?

From my experiments https://github.com/Shuenhoy/DotnetETExp , using interface seems to be slower than even eager evaluation. For LINQ, the operations themself are usually heavy enough, so the overhead of boxing and dynamic dispatch with interfaces can be ignored. But for tensor operations, 1) the dimension may be not as large as the overhead of interface can be ignored, 2) there are usually more frequent operations, it is very common for something like (m.transpose() * (a + b * s)).dot(c.row(3)) + d.sum(), i.e. you can easily involve more operations in even one single line.

GeorgeS2019 commented 7 months ago

Interop with TorchSharp

There are MANY hundreds of books written in python which TorchSharp can leverage

When dealing with Tensors in .NET, only very advanced users understand what to do with them in .NET

I urge more discussions using TorchSharp as context so we can see and then share where Tensors in .NET is attempting to acheive.

What kinds of gaps and use cases that demand Tensors in .NET?

The whole concept of why we need to do this and how the goals not possible with e.g. TorchSharp is not very clear.

The .NET community is SO FAR behind compared to python.

Introducing something so that .NET community has something to play with without helping the communtiy to see THE WHOLE is making us nervous

tannergooding commented 3 months ago

Moving remaining work here to .NET 10. We made significant progress in .NET 9

dotnet / runtime

Tensors in .NET #98323