Open zsz00 opened 2 years ago
Seems easy enough to implement
For the most part, it seems pretty simple, I believe the steps would be to divide the images into patches, use knn_graph
function to get the graph representation, and after that, it is pretty much only applying convolution layers. However, while going through the paper I noticed it mentions Dynamic Convolution in the pseudocode, but on a quick review of the code I couldn't find the model using it.
Although I am still working on understanding the code, I wanted to make sure I am on the right path.
ref of ViG: Represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks
http://arxiv.org/abs/2206.00272 Vision GNN: An Image is Worth Graph of Nodes https://github.com/huawei-noah/CV-Backbones/tree/master/vig_pytorch https://gitee.com/mindspore/models/tree/master/research/cv/ViG