Link to another project: DPT (Dense Prediction Transformers) - State of the art Semantic-segmentation and Monocular depth estimation network
Top-1 accuracy on Pascal-Context Semantic segmentations dataset, and NYU Depth v2 mono-depth dataset, by using visual transformers.
Top-2 on ADE20K Semantic segmentations dataset. The UperNet (Swin-T/S/B/L) network is more accuate than DPT on ADE20K, but Swin is not real-time, while DPT is faster and real-time.
Additionally comparison Scaled-YOLOv4 vs Swin (Table 2) for Object Detection in term of Speed and Accuracy:
Click to open the table 2 from https://arxiv.org/pdf/2103.14030v1.pdf
![image](https://user-images.githubusercontent.com/4096485/112776487-7f235880-9048-11eb-8a76-a8e230b7a30e.png)
Video example of DPT (Dense Prediction Transformers): State of the art Real-time neural network for Semantic segmentation and Mono-Depth estimation from one RGB image.
Link to another project: DPT (Dense Prediction Transformers) - State of the art Semantic-segmentation and Monocular depth estimation network
Top-1 accuracy on Pascal-Context Semantic segmentations dataset, and NYU Depth v2 mono-depth dataset, by using visual transformers.
Top-2 on ADE20K Semantic segmentations dataset. The UperNet (Swin-T/S/B/L) network is more accuate than DPT on ADE20K, but Swin is not real-time, while DPT is faster and real-time.
Paper: https://arxiv.org/abs/2103.13413
GitHub (Pytorch): https://github.com/intel-isl/DPT
Paperswithcode: https://paperswithcode.com/paper/vision-transformers-for-dense-prediction