Feasibility to (semi) automate the porting of Python Jupyter notebooks to .NET interactive?

GeorgeS2019 commented 2 years ago

As TorchSharp is fast approaching 1.0 version!

If there is a way to (semi) automate python Jupyter notebooks to .NET interactive, replacing PyTorch statements to TorchSharp?

The main motivation is to (semi) automate way to EVALUATE and to demonstrate the high coverage of TorchSharp.

The result of the conversion is a e.g. log, stating which PyTorch functions and methods are still not "covered" in TorchSharp.

This log will accelerate the process of covering up the gaps in TorchSharp.
Second, this will give further confidence of the maturity of TorchSharp and hard work by Microsoft team supporting this critical unmet data science and machine learning needs for .NET communities.

This will be a multi-expertise attempts involving different specialists from community, dotnet and Microsoft teams.

Some of the tools and expertise needed:

Python to Csharp code conversion: e.g. PyToCS
Rosyln for checking the feasibility to compile the converted codes
.NET interactive team for the vision of to making .NET communities tool to transition from python to .NET or inspire more .NET use of .NET intereactive in data science and machine learning
.NET Python 3 parser (alternative e.g. IronPython3, if the need is insufficient provided by PyToCS. )
Use case

Using .NET intereactive, the user open a python Juypter document and press convert to get the Conversion report and a Work in Progress conversoin to TorchSharp.

Feedback and discussion please :-)

lostmsu commented 2 years ago

Clearly out of scope for TorchSharp

GeorgeS2019 commented 2 years ago

@lostmsu whether doing it within TorchSharp or outside, this will eventually happen. Posting here to gather discussion.

===> how best to demonstrate the real world close to 100% coverage of TorchSharp and its reliability as a .NET deep learning option in real world practical scenarios.

GeorgeS2019 commented 2 years ago

Follow the discussion here

Python Source: TEXT CLASSIFICATION WITH THE TORCHTEXT LIBRARY

from torch import nn

class TextClassificationModel(nn.Module):

    def __init__(self, vocab_size, embed_dim, num_class):
        super(TextClassificationModel, self).__init__()
        self.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse=True)
        self.fc = nn.Linear(embed_dim, num_class)
        self.init_weights()

    def init_weights(self):
        initrange = 0.5
        self.embedding.weight.data.uniform_(-initrange, initrange)
        self.fc.weight.data.uniform_(-initrange, initrange)
        self.fc.bias.data.zero_()

    def forward(self, text, offsets):
        embedded = self.embedding(text, offsets)
        return self.fc(embedded)

PyToCs conversion using release binary

using nn = torch.nn;

public static class PyTorch {

    public class TextClassificationModel
        : nn.Module {

        public object embedding;

        public object fc;

        public TextClassificationModel(object vocab_size, object embed_dim, object num_class) {
            this.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse: true);
            this.fc = nn.Linear(embed_dim, num_class);
            this.init_weights();
        }

        public virtual object init_weights() {
            var initrange = 0.5;
            this.embedding.weight.data.uniform_(-initrange, initrange);
            this.fc.weight.data.uniform_(-initrange, initrange);
            this.fc.bias.data.zero_();
        }

        public virtual object forward(object text, object offsets) {
            var embedded = this.embedding(text, offsets);
            return this.fc(embedded);
        }
    }
}

Manual conversion


using static TorchSharp.torch;
using static TorchSharp.torch.nn;
using static TorchSharp.torch.nn.functional;

 class TextClassificationModel : Module
 {
     private Modules.EmbeddingBag embedding;
     private Modules.Linear fc;

     public TextClassificationModel(long vocab_size, long embed_dim, long num_class) : base("TextClassification")
     {
         embedding = EmbeddingBag(vocab_size, embed_dim, sparse: false);
         fc = Linear(embed_dim, num_class);
         InitWeights();

         RegisterComponents();
     }

     private void InitWeights()
     {
         var initrange = 0.5;

         init.uniform_(embedding.Weight, -initrange, initrange);
         init.uniform_(fc.Weight, -initrange, initrange);
         init.zeros_(fc.Bias);
     }

     public override Tensor forward(Tensor t)
     {
         throw new NotImplementedException();
     }

     public override Tensor forward(Tensor input, Tensor offsets)
     {
         using var t = embedding.forward(input, offsets);
         return fc.forward(t);
     }

     public new TextClassificationModel to(Device device)
     {
         base.to(device);
         return this;
     }
 }

GeorgeS2019 commented 2 years ago

@NiklasGustafsson

You have done so much in porting PyTorch official codes to TorchSharp. Given the same PyTorch codes can be "converted" by PyToCs, though still far from what you would do manually, do you see a need in future TorchSharp tutorials, that you could share some insights why you do this and that, why you add additional function not stated in the pytorch code e.g.

public new TextClassificationModel to(Device device)
{
    base.to(device);
    return this;
}

NiklasGustafsson commented 2 years ago

@NiklasGustafsson

You have done so much in porting PyTorch official codes to TorchSharp. Given the same PyTorch codes can be "converted" by PyToCs, though still far from what you would do manually, do you see a need in future TorchSharp tutorials, that you could share some insights why you do this and that, why you add additional function not stated in the pytorch code e.g.

I don't recall that particular one.

GeorgeS2019 commented 2 years ago

@NiklasGustafsson here is the link

NiklasGustafsson commented 2 years ago

Regarding the overall issue, I agree with @lostmsu that this is outside the scope of TorchSharp. Someone could potentially build such an automation tool, but it would have to be in an independent repository.

NiklasGustafsson commented 2 years ago

Oh, I found the code. I just don't recall why the 'new' to() method was necessary.

GeorgeS2019 commented 2 years ago

@NiklasGustafsson Not to be disrespectful. Not stating this tool is part of TorchSharp.

At this step, we see TorchSharp through mostly your eyes. All these discussion will help us discover any "gaps" you could have missed and not putting them down in e.g. tutorials

NiklasGustafsson commented 2 years ago

@GeorgeS2019 -- I'm going to close this one. It seems more appropriate as a discussion topic than an issue. If you feel strongly about it, please open it under 'discussion' as a question of related tooling for TorchSharp.

dotnet / TorchSharp

Feasibility to (semi) automate the porting of Python Jupyter notebooks to .NET interactive? #436

Use case