EasonXiao-888 / GrootVL

[NeurIPS2024 Spotlight] The official implementation of GrootVL: Tree Topology is All You Need in State Space Model
83 stars 2 forks source link

About the training time #10

Open weilli opened 1 month ago

weilli commented 1 month ago

Thank you for your work and perfect theoretical derivation!And I have some questions. Have you compared the training time with other models, such as vim, and what is the main reason for the longer time? And how about the ablation experiments on the number of the nodes? Thanks again.

EasonXiao-888 commented 1 week ago

@weilli Thank you for your interest, and I apologize for the delayed response.

  1. In our prior tests on the V100 GPU, our method’s inference throughput was 392 (img/s), compared to 374 for vanilla VMamba. Building a minimum spanning tree introduces time overhead. Initially, we constructed a tree for each block, which resulted in a throughput of 281. Notably, allowing blocks in the same stage to share a tree preserves accuracy and enhances efficiency with 392 (img/s). We provide a detailed comparison in camera-ready version.
  2. Given a sequence with the length of L with an established corresponding minimum spanning tree, for the case of single-vertex setting, we treat it as the root of a tree and aggregate features from other vertices, which operate in o(L) complexity. While for the all-vertices setting, a naive approach treats each vertex as a root separately, resulting in O(L^2) complexity. In contrast, we propose a dynamic programming algorithm where a random vertex is chosen as the root, features are aggregated from leaf vertices to the root, followed by propagation from the root to the leaves, achieving the same effect. For node ablation, please refer to Table 6 in the manuscript. If helpful, feel free to star ⭐️ the repo ❤️❤️❤️.