Closed c-schicho closed 7 months ago
use the implementations from #2 and #3 and create the local vision transformer as proposed in https://arxiv.org/pdf/2311.06651v1.pdf
reuse the code from https://github.com/EnablingIntelligence/Next-ViT
use the implementations from #2 and #3 and create the local vision transformer as proposed in https://arxiv.org/pdf/2311.06651v1.pdf