aim-uofa / AdelaiDet

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
https://git.io/AdelaiDet
Other
3.39k stars 652 forks source link

Zero Ap on training custom dataset on BoxInst (No weights in checkpoint matched with model) #572

Closed engrjav closed 2 years ago

engrjav commented 2 years ago

I have trained box inst using config MS_R_501x.yaml without pretrained imagenet weights. I followed demo for training custom model for detectron 2 " https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD-m5" for writing training script

The training script is as following :

import logging import os from collections import OrderedDict import torch from torch.nn.parallel import DistributedDataParallel

import detectron2.utils.comm as comm from detectron2.data import MetadataCatalog, build_detection_train_loader from detectron2.engine import DefaultTrainer, default_argument_parser, default_setup, hooks, launch from detectron2.utils.events import EventStorage from detectron2.evaluation import ( COCOEvaluator, COCOPanopticEvaluator, DatasetEvaluators, LVISEvaluator, PascalVOCDetectionEvaluator, SemSegEvaluator, verify_results, ) from detectron2.modeling import GeneralizedRCNNWithTTA from detectron2.utils.logger import setup_logger

from adet.data.dataset_mapper import DatasetMapperWithBasis from adet.data.fcpose_dataset_mapper import FCPoseDatasetMapper from adet.config import get_cfg from adet.checkpoint import AdetCheckpointer from adet.evaluation import TextEvaluator from detectron2.evaluation import COCOEvaluator, inference_on_dataset from detectron2.data import build_detection_test_loader

To register dataset

from detectron2.data.datasets import register_coco_instances register_coco_instances("MDset_train", {}, "D:/JCodeExp/facebook/AdelaiDet/datasets/MD/annotations/instances_train2017.json", "D:/JCodeExp/facebook/AdelaiDet/datasets/MDset/train2017" ) register_coco_instances("MDset_val", {}, "D:/JCodeExp/facebook/AdelaiDet/datasets/MD/annotations/instances_val2017.json", "D:/JCodeExp/facebook/AdelaiDet/datasets/MDset/val2017")

cfg = get_cfg() cfg.merge_from_file("configs/BoxInst/MS_R_50_1x.yaml ") cfg.DATASETS.TRAIN = ("MDset_train",) # name should match the one used when registering the dataset cfg.DATASETS.TEST = () cfg.DATALOADER.NUM_WORKERS = 0 # i did this

i added all dont know how to specify num of classes

cfg.MODEL.ROI_HEADS.NUM_CLASSES=31 cfg.MODEL.RETINANET.NUM_CLASSES=31 cfg.MODEL.SEM_SEG_HEAD.NUM_CLASSES=31

MODEL.RETINANET.NUM_CLASSEcfg.MODEL.CondInst.NUM_CLASSES = 31 # meta architechture is cond ist in boxinst

cfg.MODEL.FCOS.NUM_CLASSES=31 cfg.MODEL.CONDINST.MAX_PROPOSALS=-1 # i added cfg.MODEL.CONDINST.TOPK_PROPOSALS_PER_IM=16 # i added to reduce from 64 to 16 for saving cuda memory cfg.MODEL.BOXINST.TOPK_PROPOSALS_PER_IM=16 # i added
cfg.SOLVER.IMS_PER_BATCH = 1 # This is the real "batch size" commonly known to deep learning people

cfg.SOLVER.MAX_ITER = 60000 # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset

cfg.SOLVER.BASE_LR = 0.000125 # pick a good LR cfg.SOLVER.STEPS = [] # do not decay learning rate

cfg.MODEL.BoxInst.BATCH_SIZE_PER_IMAGE = 128

cfg.MODEL.SOLOV2.NUM_CLASSES = 31

MetadataCatalog.get("MD_train").thing_classes =["names of classes",]

i added

added from detectron2 dem,o

from detectron2.engine import DefaultTrainer

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=True) trainer.train()

i trained it for 60 k loops

when i tried to evaluate it using attached script, it gives a list of errors as attached . The main error are

No weights in checkpoint matched with model. Some model parameters or buffers are not found in the checkpoint:

and gives :

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000

The main classes names are correct but AP is zero for all.

can any one guide me where i got wrong? The tensor flow shows my training losses as converging. Still i get zero AP on testing

Also i gave simple commands such as python train.py python test.p Errors on testing.txt y test2.txt

engrjav commented 2 years ago

ameyparanjape Can you guide if i am doing anything wrong here? please

engrjav commented 2 years ago

i even trained model for 90 k from train_net.py but again same errors

engrjav commented 2 years ago

@tianzhi0549 can you guide please

engrjav commented 2 years ago

There was an error in my test.py file. when i got it evaluated from train_net.py, i got results