andytu28 / VQT

BSD 3-Clause "New" or "Revised" License
20 stars 0 forks source link

Although there are few intermediate parameters required to construct a computation graph, the number of learnable parameters is large #3

Open XXD-N opened 11 months ago

XXD-N commented 11 months ago

In this case, the number of parameters in the classification header will become very large, occupying the majority of learnable parameters. For example, using the promp_token = 30 mentioned in the paper, when I implement it using PyTorch, the model I use is vit_base, and the number of learnable parameters reaches 30M.