IDEA-Research / detrex

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
https://detrex.readthedocs.io/en/latest/
Apache License 2.0
1.97k stars 206 forks source link

Custom MaskDINO training crashes when setting number of classes #165

Closed alrightkami closed 1 year ago

alrightkami commented 1 year ago

My goal is to fine-tune MaskDINO on a custom dataset with only one class. For that I changed the config following the instructions in this issue:

import datetime
from detrex.config import get_config
from .models.maskdino_r50 import model
from .data.coco_instance_seg import dataloader

from fvcore.common.param_scheduler import MultiStepParamScheduler
from detectron2.config import LazyCall as L
from detectron2.solver import WarmupParamScheduler

# get default config
train = get_config("common/train.py").train

# max training iterations
train.max_iter = 36875

# warmup lr scheduler
lr_multiplier = L(WarmupParamScheduler)(
    scheduler=L(MultiStepParamScheduler)(
        values=[1.0, 0.1],
        milestones=[32777, 35509],
    ),
    warmup_length=10 / train.max_iter,
    warmup_factor=1.0,
)

optimizer = get_config("common/optim.py").AdamW

# initialize checkpoint to be loaded
train.init_checkpoint = "./projects/maskdino/maskdino_r50_50ep_300q_hid2048_3sd1_instance_maskenhanced_mask46.3ap_box51.7ap.pth" 
train.output_dir = "./output/maskdino/" + datetime.datetime.now().strftime("%d%m_%H%M")

# run evaluation every n iters
train.eval_period = 500

# log training infomation every n iters
train.log_period = 20

# save checkpoint every n iters
train.checkpointer.period = 9999

# gradient clipping for training
train.clip_grad.enabled = True
train.clip_grad.params.max_norm = 0.01
train.clip_grad.params.norm_type = 2

# set training devices
train.device = "cuda" # or "cuda:1" or "cpu"

# modify optimizer config
optimizer.lr = 1e-4
optimizer.betas = (0.9, 0.999)
optimizer.weight_decay = 1e-4
optimizer.params.lr_factor_func = lambda module_name: 0.1 if "backbone" in module_name else 1

# modify dataloader config
dataloader.train.num_workers = 1
dataloader.train.total_batch_size = 1

# dump the testing results into output_dir for visualization
dataloader.evaluator.output_dir = train.output_dir

# data
dataloader.train.dataset.names = 'graffiti_train'
dataloader.test.dataset.names = 'graffiti_test'

# number of classes
model.num_classes = 1

However, training crashes with following log:

ERROR [12/06 15:54:21 d2.config.instantiate]: Error when instantiating projects.maskdino.maskdino.MaskDINO!
Traceback (most recent call last):
  File "tools/train_net_graffiti.py", line 232, in <module>
    launch(
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/engine/launch.py", line 82, in launch
    main_func(*args)
  File "tools/train_net_graffiti.py", line 227, in main
    do_train(args, cfg)
  File "tools/train_net_graffiti.py", line 161, in do_train
    model = instantiate(cfg.model)
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/config/instantiate.py", line 83, in instantiate
    return cls(**cfg)
TypeError: __init__() got an unexpected keyword argument 'num_classes'

If I'm not including the number of classes in the config, the training goes normally, but crushes at inference:

[12/06 11:08:47] d2.evaluation.evaluator INFO: Inference done 483/483. Dataloading: 0.0023 s/iter. Inference: 0.1729 s/iter. Eval: 1.3440 s/iter. Total: 1.5193 s/iter. ETA=0:00:00
[12/06 11:08:47] d2.evaluation.evaluator INFO: Total inference time: 0:12:06.363428 (1.519589 s / iter per device, on 1 devices)
[12/06 11:08:47] d2.evaluation.evaluator INFO: Total inference pure compute time: 0:01:22 (0.172937 s / iter per device, on 1 devices)
[12/06 11:08:47] d2.evaluation.coco_evaluation INFO: Preparing results for COCO format ...
[12/06 11:08:47] d2.engine.train_loop ERROR: Exception during training:
Traceback (most recent call last):
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/engine/train_loop.py", line 150, in train
    self.after_step()
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/engine/train_loop.py", line 180, in after_step
    h.after_step()
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/engine/hooks.py", line 555, in after_step
    self._do_eval()
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/engine/hooks.py", line 528, in _do_eval
    results = self._func()
  File "tools/train_net_graffiti.py", line 194, in <lambda>
    hooks.EvalHook(cfg.train.eval_period, lambda: do_test(cfg, model)),
  File "tools/train_net_graffiti.py", line 135, in do_test
    ret = inference_on_dataset(
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/evaluation/evaluator.py", line 204, in inference_on_dataset
    results = evaluator.evaluate()
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/evaluation/coco_evaluation.py", line 206, in evaluate
    self._eval_predictions(predictions, img_ids=img_ids)
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/evaluation/coco_evaluation.py", line 240, in _eval_predictions
    assert category_id < num_classes, (
AssertionError: A prediction has class=9, but the dataset only has 1 classes and predicted class id should be in [0, 0].

If you could fix this or help me figure out what I am doing wrong I would be very thankful!

HaoZhang534 commented 1 year ago

Please try model.sem_seg_head.num_classes=1 .