The model size is too large to fit into my 4090. Are there any ideas to shrink the model size while affecting the performance as little as possible?
I don't know which parameters affect the model size the most, and how to adjust it.
I am working on HKO-7, which is 5005001.
And in addition, could anyone explain the intuitions behind the cuboid attention? I wonder if there is a way to avoid using the 3-d structure, which is computationally expensive. I believe that there exists an elegant and computationally cheap way to deal with the spatial-temporal data.
The model size is too large to fit into my 4090. Are there any ideas to shrink the model size while affecting the performance as little as possible?
I don't know which parameters affect the model size the most, and how to adjust it.
I am working on HKO-7, which is 5005001.
And in addition, could anyone explain the intuitions behind the cuboid attention? I wonder if there is a way to avoid using the 3-d structure, which is computationally expensive. I believe that there exists an elegant and computationally cheap way to deal with the spatial-temporal data.