Closed engrjav closed 2 years ago
ameyparanjape Can you guide if i am doing anything wrong here? please
i even trained model for 90 k from train_net.py but again same errors
@tianzhi0549 can you guide please
There was an error in my test.py file. when i got it evaluated from train_net.py, i got results
I have trained box inst using config MS_R_501x.yaml without pretrained imagenet weights. I followed demo for training custom model for detectron 2 " https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD-m5" for writing training script
The training script is as following :
import logging import os from collections import OrderedDict import torch from torch.nn.parallel import DistributedDataParallel
import detectron2.utils.comm as comm from detectron2.data import MetadataCatalog, build_detection_train_loader from detectron2.engine import DefaultTrainer, default_argument_parser, default_setup, hooks, launch from detectron2.utils.events import EventStorage from detectron2.evaluation import ( COCOEvaluator, COCOPanopticEvaluator, DatasetEvaluators, LVISEvaluator, PascalVOCDetectionEvaluator, SemSegEvaluator, verify_results, ) from detectron2.modeling import GeneralizedRCNNWithTTA from detectron2.utils.logger import setup_logger
from adet.data.dataset_mapper import DatasetMapperWithBasis from adet.data.fcpose_dataset_mapper import FCPoseDatasetMapper from adet.config import get_cfg from adet.checkpoint import AdetCheckpointer from adet.evaluation import TextEvaluator from detectron2.evaluation import COCOEvaluator, inference_on_dataset from detectron2.data import build_detection_test_loader
To register dataset
from detectron2.data.datasets import register_coco_instances register_coco_instances("MDset_train", {}, "D:/JCodeExp/facebook/AdelaiDet/datasets/MD/annotations/instances_train2017.json", "D:/JCodeExp/facebook/AdelaiDet/datasets/MDset/train2017" ) register_coco_instances("MDset_val", {}, "D:/JCodeExp/facebook/AdelaiDet/datasets/MD/annotations/instances_val2017.json", "D:/JCodeExp/facebook/AdelaiDet/datasets/MDset/val2017")
cfg = get_cfg() cfg.merge_from_file("configs/BoxInst/MS_R_50_1x.yaml ") cfg.DATASETS.TRAIN = ("MDset_train",) # name should match the one used when registering the dataset cfg.DATASETS.TEST = () cfg.DATALOADER.NUM_WORKERS = 0 # i did this
i added all dont know how to specify num of classes
cfg.MODEL.ROI_HEADS.NUM_CLASSES=31 cfg.MODEL.RETINANET.NUM_CLASSES=31 cfg.MODEL.SEM_SEG_HEAD.NUM_CLASSES=31
MODEL.RETINANET.NUM_CLASSEcfg.MODEL.CondInst.NUM_CLASSES = 31 # meta architechture is cond ist in boxinst
cfg.MODEL.FCOS.NUM_CLASSES=31 cfg.MODEL.CONDINST.MAX_PROPOSALS=-1 # i added cfg.MODEL.CONDINST.TOPK_PROPOSALS_PER_IM=16 # i added to reduce from 64 to 16 for saving cuda memory cfg.MODEL.BOXINST.TOPK_PROPOSALS_PER_IM=16 # i added
cfg.SOLVER.IMS_PER_BATCH = 1 # This is the real "batch size" commonly known to deep learning people
cfg.SOLVER.MAX_ITER = 60000 # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset
cfg.SOLVER.BASE_LR = 0.000125 # pick a good LR cfg.SOLVER.STEPS = [] # do not decay learning rate
cfg.MODEL.BoxInst.BATCH_SIZE_PER_IMAGE = 128
cfg.MODEL.SOLOV2.NUM_CLASSES = 31
MetadataCatalog.get("MD_train").thing_classes =["names of classes",]
i added
added from detectron2 dem,o
from detectron2.engine import DefaultTrainer
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=True) trainer.train()
i trained it for 60 k loops
when i tried to evaluate it using attached script, it gives a list of errors as attached . The main error are
No weights in checkpoint matched with model. Some model parameters or buffers are not found in the checkpoint:
and gives :
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
The main classes names are correct but AP is zero for all.
can any one guide me where i got wrong? The tensor flow shows my training losses as converging. Still i get zero AP on testing
Also i gave simple commands such as python train.py python test.p Errors on testing.txt y test2.txt