Provide Nested graph layout example on Transformer based ONNX

GeorgeS2019 commented 3 years ago

Is there a small example how to use Dagre.NET to do Nested graph layout?

Bert Onnx has a higher level Nested layout e.g. NX involving encoding and deconding.

We may need a discussion on how to best present the Transformer graph layout

fel88 commented 3 years ago

Are you sure that Netron supports nested layout? I've tried to open BERT in Netron, but layout is messy (it wasn't grouped in any layers).

BTW: Do you have a small one onnx file with nested graph layout for debugging purposes (not so large as BERT)?

ps: BERT's loading is extremely slow now. At first, I'll try to fix it

GeorgeS2019 commented 3 years ago

@fel88 no one has (perhaps any) solution with proper nested layout of ONNX with Transformer architecture.

This .NET deep learning framework: Seq2SeqSharp perhaps so far the only one displaying Transformer architecture.

The top big rectangle is Encoder and the bottom big rectangle is Decoder.

ONNX currently does not support (YET) this layout.

Perhaps we need to push the ONNX committee to think about that

layout is messy

Yes, because it is so complex and messy, there is a STRONG need to organize the messy layout into something that provide a top level view of the transformer nested layout.

This is not something that can be solved immediately, but it will contribute to the entire .NET community when done RIGHT!!!

fel88 commented 3 years ago

It sounds challenging. We can try to add the prefix "enc1." to the names of all encoder nodes , and "dec1." to the names of all decoder nodes. It will be enought to group them and layout separately and bound them into rectangles with the '+' button at the top, which allow you to collapse / expand these groups. I'll try to experiment with a simple ONNX model

GeorgeS2019 commented 3 years ago

@fel88 It is great that you take the challenge. Start with a small step and learn more as you go along. The architecture is VERY Important and is the NEXT LEVEL AI. It is worth learning and it is EVEN more important if you can contribute to the support of the solution.

GeorgeS2019 commented 3 years ago

FYI, TorchSharp is one of the TWO I know that support the MultiHead using in the Transformer Architecture.

GeorgeS2019 commented 3 years ago

@fel88

Can you try feedback to this question from Seq2SeqSharp

fel88 commented 3 years ago

BERT can be loaded now, but it takes a long time (~90sec) I'll try to optimize Dagre to reduce the loading time ASAP .

bert1

GeorgeS2019 commented 3 years ago

@fel88

I tried recently, the Mint.onnx now fails to load.

Ideally, there is a need for unit tests that load a number of ONNX in Onnx Model zoo just to check there is no regression whenever a new commit is made.

fel88 commented 3 years ago

@fel88

I tried recently, the Mint.onnx now fails to load.

Hmm, bad. Does the error still remain so far (with latest commit)? If so, could you provide this Onnx model pls

Ideally, there is a need for unit tests that load a number of ONNX in Onnx Model zoo just to check there is no regression whenever a new commit is made.

Yep, it should be done. I just don't want to add model files to the repository. An external repository with Onnx models should be used for this purpose

fel88 commented 3 years ago

@GeorgeS2019 BTW, I just found out that Dagre has clusters (https://dagrejs.github.io/project/dagre-d3/latest/demo/clusters.html) I've partially implemented clusters (https://github.com/fel88/Dendrite/tree/dagre-debug), but there are still some bugs so far (BERT doesn't work yet) cluster1 cluster2

I'll try to fix it soon.

Cluster code sample (will be available in Dagre.NET soon):

DagreInputGraph dg = new DagreInputGraph();

//set nodes
var nd1 = dg.AddNode(new { Name = "input" }, 100, 20);
var nd2 = dg.AddNode(new { Name = "node1" }, 150, 30);
var nd3 = dg.AddNode(new { Name = "node2" }, 150, 30);
var nd4 = dg.AddNode(new { Name = "output" }, 100, 20);

//set edges
dg.AddEdge(nd1, nd2, 2);
dg.AddEdge(nd2, nd3);
dg.AddEdge(nd3, nd4, 2);

//set clusters
var group1 = dg.AddGroup(new {Name = "group"});
dg.SetGroup(nd2, group1);
dg.SetGroup(nd3, group1);

//layout
dg.Layout();

Console.WriteLine($"{((dynamic)nd1.Tag).Name} : {nd1.X} {nd1.Y}");
Console.WriteLine($"{((dynamic)nd2.Tag).Name} : {nd2.X} {nd2.Y}");
Console.WriteLine($"{((dynamic)nd3.Tag).Name} : {nd3.X} {nd3.Y}");
Console.WriteLine($"{((dynamic)nd4.Tag).Name} : {nd4.X} {nd4.Y}");
//groups
Console.WriteLine($"{((dynamic)group1.Tag).Name} : {group1.X} {group1.Y} {group1.Width} {group1.Height}");

GeorgeS2019 commented 3 years ago

@fel88 Great discovery. Great Job!!!

GeorgeS2019 commented 3 years ago

@fel88 For your inspiration => Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention

GeorgeS2019 commented 3 years ago

Feedback from Netron developers

The issue has a few relevant suggestions how to handle large hierarchical graph

fel88 / Dagre.NET

Provide Nested graph layout example on Transformer based ONNX #2