Merging LibTorchSharp in TorchSharp

artidoro commented 5 years ago

Current situation:

@interesaaat has done quite a bit of work on his fork of TorchSharp. He has also created a separate repo with C/C++ code, LibTorchSharp on which his fork TorchSharp depends.

I have started integrating TorchSharp with ML.NET with a torch scoring transformer (which takes pretrained models) and a torch trainer which trains a user defined model. This work depends on @interesaaat's fork.

Plan:

Given the scattered work it seems like we should decide on a high level plan on how to streamline our development efforts. Here is what I think our goal should be:

Have a single repository TorchSharp which builds both the native components and the C# code this essentially means merging @interesaaat's TorchSharp and LibtorchSharp repositories
Have an official build that produces a signed nuget and publishes it to some nuget feed
Either as part of this nuget, or as part of a separate nuget we should redistribute the libtorch dlls that we use and publish them to a nuget feed

Expected outcome:

For the developers:

Access to the latest work by @interesaaat.
Simplification of future work involving both native and C# code since they would be both in the same repository (like adding new APIs or working on auto-generating the API).
On the ML.NET side we would be able to take a dependency on the produced nugets.

For the users: Anyone who wants to use TorchSharp directly can use the nugets from the feed instead of having to

download libtorch and adding it to the environment variables
compile libtorchsharp and adding it to the environment variables
downloading the TorchSharp nuget

Execution (WIP):

I am working on these three points with @interesaaat in my fork. It's still a work in progress but I am using ML.NET's build infrastructure to build both native and C# code, to have the official build that pushes to a feed, and as a model for the redistribution of libtorch (see Microsoft.ML.Tensoflow.Redist). I am currently able to build everything and run the tests. I have discussed with CELA about the redistribution of libtorch dlls. I am still figuring out the best way to do that, but it is possible.

cc/ @interesaaat @migueldeicaza

nietras commented 5 years ago

@artidoro we have build a full ML pipeline in C# based on CNTK (which is now done with version 2.7) and are very interested in alternatives that work on Windows too. From our experience with CNTK I would suggest splitting the nuget packages in a few underlying parts (naming and technicalities aside):

TorchSharp.Dlls.CPU - CPU only native dlls (if 32-bit x86 are also possible would be great for us) TorchSharp.Dlls.GPU - GPU native dlls (might be split based on underlying API use e.g. cudnn, etc.) TorchSharp.PInvoke - Managed only and with no dependencies to dll packages (can be referenced by class libraries that then do not require the dll packages and hence can reduce disk footprint etc) TorchSharp - meta package that can combine all necessary packages and provide ease of use of most users.

Why split it like this, our experience from CNTK where the GPU package is quite heavy is that you get quite a big disk footprint and "drag" in class libraries where the dlls are not needed. This does have pros and cons.

If there are any up-for-grabs items for this I would be interested in helping out and in giving feedback. :)

mdabros commented 5 years ago

@artidoro I'm also interested in helping with up-for-grabs items should you guys need it :-).

migueldeicaza commented 5 years ago

I like the plan.

On the build piece, I do not know what ML.NET build infrastructure is, is it Azure DevOps? I would love to have this on Azure DevOps, so anyone forking the repo, could trivially reuse and rebuild things on their own.

Azure-DevOps wise, this can both do builds, and publish NuGet packages.

artidoro commented 5 years ago

Yes absolutely will keep using Azure DevOps.

@nietras thanks for the suggestion, I think it's a good idea to keep the packages separate.

As soon as we have the repositories merged, I am sure there will be a lot of up-for-grabs items!

interesaaat commented 5 years ago

Thanks @artidoro for putting this together!

Regarding the native dlls, I believe we should redistribute all the (stable) versions of libtorch available on pytorch.org. My guess is that once we are done with one, for the others it should be straightforward.

markusweimer commented 5 years ago

Regarding the native dlls, I believe we should redistribute all the (stable) versions of libtorch available on pytorch.org. My guess is that once we are done with one, for the others it should be straightforward.

By "redistribute" you mean actually shipping the DLLs / .sos? Is there a NuGet or such which we can depend on instead?

artidoro commented 5 years ago

@markusweimer the plan is to make that nuget!

The plan is to package the libtorch DLLs that we need in a NuGet (similar to the Microsoft.ML.Tensorflow.Redist NuGet which has some Tensorflow DLLs). This would enable a user to take a dependency on the Libtorch.Redist NuGet (name is still to be finalized).

Regarding the version of the DLLs I am starting with 1.0.1 since that's what LibtTorchSharp depends on, but we are planning to upgrade to 1.1 once we have the repos actually merged.

markusweimer commented 5 years ago

@ebarsoumMS might be able to help? He mentioned some time ago that he was looking into producing libtorch NuGets.

ebarsoum commented 5 years ago

libtorch dll is in C++, so the ABI might be compiler depended. Should we use something like: https://github.com/microsoft/vcpkg instead?

markusweimer commented 5 years ago

VCPKG is a source format, right? That means to build TorchSharp, one would have to have a working C++ toolchain around. I am not sure whether that is a reasonable requirement to impose.

I believe the ABI issue is only a Linux issue. To overcome it, we can follow the model of .NET itself or (well behaved) Python packages: Compile on ancient Linux OSs, and trust that the ABI compatibility of new Linux is covering us. For .NET, they use CentOS 7 (RedHat 7) as the build environment. And Python packages using the manylinux1 moniker use CentOS 5, for even greater binary compatibility.

Also, PyTorch itself must face (and solve) the same issue, as it is available as a PIP package which doesn't impose a recompile upon install.

Lastly, it absolutely makes sense to have a vcpkg as well, to make it easy to consume this work in C++ projects.

ebarsoum commented 5 years ago

We can do it this way, but there is an ABI break in GCC after 5.5, this is why we have _GLIBCXX_USE_CXX11_ABI flag.

dotnet / TorchSharp