chenller / mmseg-extension

mmsegmentation extension library containing the latest paper code.
Apache License 2.0
6 stars 0 forks source link


English | 简体中文


mmseg-extension is a comprehensive extension of the MMSegmentation library (version 1.x), designed to provide a more versatile and up-to-date framework for semantic segmentation. This repository consolidates the latest advancements in semantic segmentation by integrating and unifying various models and codes within the MMSegmentation ecosystem. Users benefit from a consistent and streamlined training and testing process, significantly reducing the learning curve and enhancing productivity.

The main branch works with PyTorch 2.0 or higher (we recommend PyTorch 2.3). You can still use PyTorch 1.x, but no testing has been conducted.

Features and Objectives

Addressing Key Issues
- **Staying Current with Latest Models** mmseg-extension addresses the delay in MMSegmentation's inclusion of the latest models by continuously integrating the newest research. - **Standardizing Disparate Codebases** By providing a unified framework, mmseg-extension solves the problem of inconsistent data loading, training, and validation scripts across different research papers. - **Utilizing Pre-trained Weights** Ensures compatibility with pre-trained weights from various repositories, enabling seamless model integration without the need for retraining.

Installation and Usage

Overview of Model Zoo

Name Year Publication Paper Code
ViT-Adapter 2023 ICLR Arxiv Code
ViT-CoMer 2024 CVPR Arxiv Code
TransNeXt 2024 CVPR Arxiv Code
UniRepLKNet 2024 CVPR Arxiv Code
BiFormer 2023 CVPR Arxiv Code
ConvNeXt V2 2023 CVPR Arxiv Code
InternImage 2023 CVPR Arxiv Code
FlashInternImage 2024 CVPR Arxiv Code

Loss Function

Name Year Publication Paper Code
Balanced Softmax Loss 2020 NeurIPS Arxiv Code

Completed Work Results

Identifier Description
| Identifier | description | |------------|--------------------------------------------------------| | ✔ | Supported | | ✖ | Not supported, but may be supported in future versions | | **-** | Not tested |


You can find detailed information about ViT Adapters in

ViT-Adapter Pretraining Sources
| Name | Year | Type | Data | Repo | Paper | Support? | |---------------|------|------------|--------------|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| | DeiT | 2021 | Supervised | ImageNet-1K | [repo]( | [paper]( | ✔ | | AugReg | 2021 | Supervised | ImageNet-22K | [repo]( | [paper]( | - | | BEiT | 2021 | MIM | ImageNet-22K | [repo]( | [paper]( | - | | Uni-Perceiver | 2022 | Supervised | Multi-Modal | [repo]( | [paper]( | ✖ | | BEiTv2 | 2022 | MIM | ImageNet-22K | [repo]( | [paper]( | - |
ViT-Adapter ADE20K val
| Method | Backbone | Pretrain | Lr schd | Crop Size | mIoU (SS/MS) | #Param | Config | Download | Support? | our mIoU (SS/MS) | our config | |:-----------:|:-------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-------:|:---------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------:|:--------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|----------|------------------|---------------------------------------------------------------------------------------| | UperNet | ViT-Adapter-T | [DeiT-T]( | 160k | 512 | 42.6 / 43.6 | 36M | [config](./configs/ade20k/ | [ckpt]( \| [log]( | ✔ | -/- | [config](./configs/vit_adapter/ | | UperNet | ViT-Adapter-S | [DeiT-S]( | 160k | 512 | 46.2 / 47.1 | 58M | [config](./configs/ade20k/ | [ckpt]( | ✔ | 46.09/46.48 | [config](./configs/vit_adapter/ | | UperNet | ViT-Adapter-B | [DeiT-B]( | 160k | 512 | 48.8 / 49.7 | 134M | [config](./configs/ade20k/ | [ckpt]( \| [log]( | ✔ | 48.00/49.21 | [config](./configs/vit_adapter/ | | UperNet | ViT-Adapter-T | [AugReg-T]( | 160k | 512 | 43.9 / 44.8 | 36M | [config](./configs/ade20k/ | [ckpt]( \| [log]( | ✔ | -/- | [config](./configs/vit_adapter/ | | UperNet | ViT-Adapter-B | [AugReg-B]( | 160k | 512 | 51.9 / 52.5 | 134M | [config](./configs/ade20k/ | [ckpt]( \| [log]( | ✔ | -/- | [config](./configs/vit_adapter/ | | UperNet | ViT-Adapter-L | [AugReg-L]( | 160k | 512 | 53.4 / 54.4 | 364M | [config](./configs/ade20k/ | [ckpt]( \| [log]( | ✔ | -/- | [config](./configs/vit_adapter/ | | UperNet | ViT-Adapter-L | [Uni-Perceiver-L]( | 160k | 512 | 55.0 / 55.4 | 364M | [config](./configs/ade20k/ | [ckpt]( \| [log]( | ✖ | ✖ | ✖ | | UperNet | ViT-Adapter-L | [BEiT-L]( | 160k | 640 | [58.0]( / [58.4]( | 451M | [config](./configs/ade20k/ | [ckpt]( \| [log]( | ✔ | 58.08/58.16 | [config](./configs/vit_adapter/ | | Mask2Former | ViT-Adapter-L | [BEiT-L]( | 160k | 640 | [58.3]( / [59.0]( | 568M | [config](./configs/ade20k/ | [ckpt]( \| [log]( | ✔ | 58.36/- | [config](./configs/vit_adapter/ | | Mask2Former | ViT-Adapter-L | [BEiT-L+COCO]( | 80k | 896 | [59.4]( / [60.5]( | 571M | [config](./configs/ade20k/ | [ckpt]( \| [log]( | ✔ | -/- | [config](./configs/vit_adapter/ | | Mask2Former | ViT-Adapter-L | [BEiTv2-L+COCO]( | 80k | 896 | 61.2 / 61.5 | 571M | [config](./configs/ade20k/ | [ckpt]( \| [log]( | ✔ | 61.43/- | [config](./configs/vit_adapter/ |


ViT-CoMer ADE20K val
| Method | Backbone | Pretrain | Lr schd | Crop Size | mIoU(SS/MS) | #Param | Config | Ckpt | Log | Support? | our mIoU (SS/MS) | our config | |:-------:|:-----------:|:------------------------------------------------------------------:|:-------:|:---------:|:-----------:|:------:|:------------------------------------------------------------------:|:----------------------------------------------------------------:|:---------------------------------------------------------------:|----------|------------------|--------------------------------------------------------------------------| | UperNet | ViT-CoMer-T | [DeiT-T]( | 160k | 512 | 43.5/- | 38.7M | [config]( | [ckpt]( | [log]( | ✔ | 43.66/- | [config](./configs/vit_comer/ | | UperNet | ViT-CoMer-S | [DeiT-S]( | 160k | 512 | 46.5/- | 61.4M | [config]( | [ckpt]( | [log]( | ✔ | 46.09/46.23 | [config](./configs/vit_comer/ | | UperNet | ViT-CoMer-B | [DeiT-S]( | 160k | 512 | 48.8/- | 144.7M | - | - | - | ✔ | -/- | [config](./configs/vit_comer/ |


InternImage ADE20K Semantic Segmentation
| backbone | method | resolution | mIoU (ss/ms) | #param | FLOPs | download | Support? | our mIoU (SS/MS) | our config | |:--------------:|:-----------:|:----------:|:------------:|:------:|:-----:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|----------|------------------|---------------------------------------------------------------------------| | InternImage-T | UperNet | 512x512 | 47.9 / 48.1 | 59M | 944G | [ckpt]( \| [cfg](segmentation/configs/ade20k/ | ✔ | 47.60/- | [config](./configs/internimage/ | | InternImage-S | UperNet | 512x512 | 50.1 / 50.9 | 80M | 1017G | [ckpt]( \| [cfg](segmentation/configs/ade20k/ | ✔ | 49.77/- | [config](./configs/internimage/ | | InternImage-B | UperNet | 512x512 | 50.8 / 51.3 | 128M | 1185G | [ckpt]( \| [cfg](segmentation/configs/ade20k/ | ✔ | 50.46/51.05 | [config](./configs/internimage/ | | InternImage-L | UperNet | 640x640 | 53.9 / 54.1 | 256M | 2526G | [ckpt]( \| [cfg](segmentation/configs/ade20k/ | ✔ | 53.39/- | [config](./configs/internimage/ | | InternImage-XL | UperNet | 640x640 | 55.0 / 55.3 | 368M | 3142G | [ckpt]( \| [cfg](segmentation/configs/ade20k/ | ✔ | 54.4/- | [config](./configs/internimage/ | | InternImage-H | UperNet | 896x896 | 59.9 / 60.3 | 1.12B | 3566G | [ckpt]( \| [cfg](segmentation/configs/ade20k/ | ✔ | 59.49/- | [config](./configs/internimage/ | | InternImage-H | Mask2Former | 896x896 | 62.5 / 62.9 | 1.31B | 4635G | [ckpt]( \| [cfg](segmentation/configs/ade20k/ | ✖ | -/- | |


FlashInternImage ADE20K Semantic Segmentation
| backbone | method | resolution | mIoU (ss/ms) | Config | Download | Support? | our mIoU (SS/MS) | our config | |:------------------:|:-------:|:----------:|:------------:|:--------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|----------|------------------|--------------------------------------------------------------------------------------| | FlashInternImage-T | UperNet | 512x512 | 49.3 / 50.3 | [config](./segmentation/configs/ade20k/ | [ckpt]( \| [log]( | ✔ | -/- | - | | FlashInternImage-S | UperNet | 512x512 | 50.6 / 51.6 | [config](./segmentation/configs/ade20k/ | [ckpt]( \| [log]( | ✔ | -/- | - | | FlashInternImage-B | UperNet | 512x512 | 52.0 / 52.6 | [config](./segmentation/configs/ade20k/ | [ckpt]( \| [log]( | ✔ | 51.22/- | [config](./configs/flash_internimage/ | | FlashInternImage-L | UperNet | 640x640 | 55.6 / 56.0 | [config](./segmentation/configs/ade20k/ | [ckpt]( \| [log]( | ✔ | -/- | - |


TransNeXt ADE20K Semantic Segmentation using the UPerNet method
| Backbone | Pretrained Model | Crop Size | Lr Schd | mIoU | mIoU (ms+flip) | #Params | Download | Config | Log | Support? | our mIoU (SS/MS) | our config | |:---------------:|:---------------------------------------------------------------------------------------------------------------------------------:|:---------:|:-------:|:----:|:--------------:|:-------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------:|----------|------------------|---------------------------------------------------------------------------------| | TransNeXt-Tiny | [ImageNet-1K]( | 512x512 | 160K | 51.1 | 51.5/51.7 | 59M | [model]( | [config](/segmentation/upernet/configs/ | [log]( | ✔ | 53.02/- | [config](./configs/transnext/ | | TransNeXt-Small | [ImageNet-1K]( | 512x512 | 160K | 52.2 | 52.5/52.8 | 80M | [model]( | [config](/segmentation/upernet/configs/ | [log]( | ✔ | 52.15/- | [config](./configs/transnext/ | | TransNeXt-Base | [ImageNet-1K]( | 512x512 | 160K | 53.0 | 53.5/53.7 | 121M | [model]( | [config](/segmentation/upernet/configs/ | [log]( | ✔ | 51.11/- | [config](./configs/transnext/ | * In the context of multi-scale evaluation, TransNeXt reports test results under two distinct scenarios: **interpolation** and **extrapolation** of relative position bias.
TransNeXt ADE20K Semantic Segmentation using the Mask2Former method
| Backbone | Pretrained Model | Crop Size | Lr Schd | mIoU | #Params | Download | Config | Log | Support? | our mIoU (SS/MS) | our config | |:---------------:|:---------------------------------------------------------------------------------------------------------------------------------:|:---------:|:-------:|:----:|:-------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------:|----------|------------------|----------------------------------------------------------------------------------| | TransNeXt-Tiny | [ImageNet-1K]( | 512x512 | 160K | 53.4 | 47.5M | [model]( | [config](/segmentation/mask2former/configs/ | [log]( | ✔ | 53.43/- | [config](./configs/transnext/ | | TransNeXt-Small | [ImageNet-1K]( | 512x512 | 160K | 54.1 | 69.0M | [model]( | [config](/segmentation/mask2former/configs/ | [log]( | ✔ | 54.06/- | [config](./configs/transnext/ | | TransNeXt-Base | [ImageNet-1K]( | 512x512 | 160K | 54.7 | 109M | [model]( | [config](/segmentation/mask2former/configs/ | [log]( | ✔ | 54.68/- | [config](./configs/transnext/ |


UniRepLKNet ADE20K Semantic Segmentation
| name | resolution | mIoU (ss/ms) | #params | FLOPs | Weights | Support? | our mIoU (SS/MS) | our config | |:------------------:|:----------:|:------------:|:-------:|:-----:|:---------------------------------------------------------------------------------------------:|----------|------------------|--------------------------------------------------------------------------------| | UniRepLKNet-T | 512x512 | 48.6/49.1 | 61M | 946G | [ckpt]( | ✔ | 47.94/- | [config](./configs/unireplknet/ | | UniRepLKNet-S | 512x512 | 50.5/51.0 | 86M | 1036G | [ckpt]( | ✔ | -/- | [config](./configs/unireplknet/ | | UniRepLKNet-S_22K | 512x512 | 51.9/52.7 | 86M | 1036G | [ckpt]( | ✔ | -/- | [config](./configs/unireplknet/ | | UniRepLKNet-S_22K | 640x640 | 52.3/52.7 | 86M | 1618G | [ckpt]( | ✔ | -/- | [config](./configs/unireplknet/ | | UniRepLKNet-B_22K | 640x640 | 53.5/53.9 | 130M | 1850G | [ckpt]( | ✔ | 52.89/- | [config](./configs/unireplknet/ | | UniRepLKNet-L_22K | 640x640 | 54.5/55.0 | 254M | 2507G | [ckpt]( | ✔ | -/- | [config](./configs/unireplknet/ | | UniRepLKNet-XL_22K | 640x640 | 55.2/55.6 | 425M | 3420G | [ckpt]( | ✖ | -/- | - | **NOTE:** Checkpoints have already been released on hugging face. You can download them right now from


BiFormer Semantic Segmentation
**NOTE:** The official code library is not open source for semantic segmentation weights. This repository can load the pre training weights of the backbone network on the ImageNet-1K dataset. You can find the weight in the [URL](

ConvNeXt V2

ConvNeXt-V2 Semantic Segmentation
**NOTE:** The official code library is not open source for semantic segmentation weights. This repository can load the pre training weights of the backbone network on the ImageNet-1K or ImageNet-22K dataset. You can find the weight in the [URL](