I appreciate your work graph hops experts!
I noticed in the code that the implementation of MoE is a multi-channel GNN, then utilize sparse attention for fusion. Regarding the blending of -hop and 2-hop experts mixture in the paper, may I ask how it should be understood in the corresponding code implementation?
I appreciate your work graph hops experts! I noticed in the code that the implementation of MoE is a multi-channel GNN, then utilize sparse attention for fusion. Regarding the blending of -hop and 2-hop experts mixture in the paper, may I ask how it should be understood in the corresponding code implementation?
Thanks a lot!