NRCan / geo-deep-learning

Deep learning applied to georeferenced datasets
https://geo-deep-learning.readthedocs.io/en/latest/
MIT License
149 stars 49 forks source link

Add Vision Transformer Model #501

Closed valhassan closed 12 months ago

valhassan commented 1 year ago

Description

Vision transformer models are based on the transformer architecture, which was introduced in the paper "Attention is All You Need" in 2017. While CNNs rely on convolutional operations to extract spatial features from the input image, ViT uses an attention mechanism to capture the relationships between different patches.

Rationale

By adding this new type of architecture into GDL we can experiment, compare, and seek performance gains.

Possible Implementation