DPT: State of the art Semantic-segmentation and Monocular depth estimation network (link to another project)

AlexeyAB commented 3 years ago

Link to another project: DPT (Dense Prediction Transformers) - State of the art Semantic-segmentation and Monocular depth estimation network

Top-1 accuracy on Pascal-Context Semantic segmentations dataset, and NYU Depth v2 mono-depth dataset, by using visual transformers.
Top-2 on ADE20K Semantic segmentations dataset. The UperNet (Swin-T/S/B/L) network is more accuate than DPT on ADE20K, but Swin is not real-time, while DPT is faster and real-time.
Paper: https://arxiv.org/abs/2103.13413
GitHub (Pytorch): https://github.com/intel-isl/DPT
Paperswithcode: https://paperswithcode.com/paper/vision-transformers-for-dense-prediction

AlexeyAB commented 3 years ago

Additionally comparison Scaled-YOLOv4 vs Swin (Table 2) for Object Detection in term of Speed and Accuracy:

101356322-f1f5a180-38a8-11eb-9907-4fe4f188d887

Click to open the table 2 from https://arxiv.org/pdf/2103.14030v1.pdf

![image](https://user-images.githubusercontent.com/4096485/112776487-7f235880-9048-11eb-8a76-a8e230b7a30e.png)

AlexeyAB commented 3 years ago

Video example of DPT (Dense Prediction Transformers): State of the art Real-time neural network for Semantic segmentation and Mono-Depth estimation from one RGB image.

Video:

AlexeyAB / darknet

DPT: State of the art Semantic-segmentation and Monocular depth estimation network (link to another project) #7547