AmeenAli / HiddenMambaAttn

Official PyTorch Implementation of "The Hidden Attention of Mamba Models"
204 stars 12 forks source link

About Train Time #14

Closed xiequan277 closed 4 months ago

xiequan277 commented 4 months ago

Thanks sharing your works , it easy to build. but i find its so slow when i want train a model for my datasets with single a GPU,I just guess if changing SSM package? i dont know.

AmeenAli commented 4 months ago

Thanks @xiequan277 for your interest in our work. Our method does not requiring training, we apply it in a post-hoc manner, we obtain the pre-trained weights from https://github.com/hustvl/Vim . Follow their instructions for more details on training