Train Error for running

davidqing2000 commented 4 years ago

I always get below error after run the train step for long time, what happen for this error? how to identify this error?

Traceback (most recent call last): File "", line 75, in trainer.train() File "d:\development\detectron2-master\detectron2\engine\defaults.py", line 350, in train super().train(self.start_iter, self.max_iter) File "d:\development\detectron2-master\detectron2\engine\train_loop.py", line 133, in train self.after_step() File "d:\development\detectron2-master\detectron2\engine\train_loop.py", line 151, in after_step h.after_step() File "d:\development\detectron2-master\detectron2\engine\hooks.py", line 310, in after_step results = self._func() File "d:\development\detectron2-master\detectron2\engine\defaults.py", line 301, in test_and_save_results self._last_eval_results = self.test(self.cfg, self.model) File "d:\development\detectron2-master\detectron2\engine\defaults.py", line 449, in test else cls.build_evaluator(cfg, dataset_name) File "d:\development\detectron2-master\detectron2\engine\defaults.py", line 418, in build_evaluator raise NotImplementedError NotImplementedError

ppwwyyxx commented 4 years ago

I think you train with a EVAL_PERIOD but you did not define what evaluator to use. Please follow train_net.py to see how to provide an evaluator to trainer. Or you can disable evaluation by setting DATASETS.TEST = ().

Also, this is a detectron2 issue, not a Detectron issue.

davidqing2000 commented 4 years ago

Yes, It's on detectron2 issue.

Here is my code, I have defined the setting --cfg.DATASETS.TEST = ("test",) , any other reason for this error?

----My Code---- import traceback from detectron2.utils.logger import setup_logger import multiprocessing multiprocessing.set_start_method('spawn', True)

from detectron2.data.datasets import register_coco_instances register_coco_instances("train", {}, "/annotations/annotations.json", "/images") register_coco_instances("test", {}, "/annotations/test_annotations.json", "/images")

from detectron2.data.catalog import MetadataCatalog, DatasetCatalog from detectron2.engine.defaults import DefaultPredictor from detectron2.utils.visualizer import ColorMode train_metadata = MetadataCatalog.get("train") test_metadata = MetadataCatalog.get("test")

train_metadata.thing_classes = ["person","aeroplane","tvmonitor","train","boat","dog","chair","bird","bicycle","bottle","sheep","diningtable","horse","motorbike","sofa","cow","car","cat","bus","pottedplant","customobj","other"]

from detectron2.engine import DefaultTrainer from detectron2.config import get_cfg import os os.environ["CUDA_VISIBLE_DEVICES"] = "0"

cfg = get_cfg() cfg.merge_from_file( "/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml" )

cfg.DATASETS.TRAIN = ("train",) cfg.DATASETS.TEST = ("test",) # no metrics implemented for this dataset cfg.DATALOADER.NUM_WORKERS = 4 cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl" # initialize from model zoo cfg.SOLVER.IMS_PER_BATCH = 10 cfg.SOLVER.BASE_LR = 0.00015 cfg.SOLVER.MAX_ITER = ( 10000 ) # 300 iterations seems good enough, but you can certainly train longer cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = ( 128 ) # faster, and good enough for this toy dataset

cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # set the testing threshold for this model cfg.MODEL.ROI_HEADS.NUM_CLASSES = 22 # 22 classes (data, fig, hazelnut)

print('loading from: {}'.format(cfg.MODEL.WEIGHTS))

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)

setup_logger(output=cfg.OUTPUT_DIR, name="cityscapes")

trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False)

try: trainer.train() except: traceback.print_exc()

print("Training..Complete...")

ppwwyyxx commented 4 years ago

The reason is explained above.

davidqing2000 commented 4 years ago

I see, thanks.

facebookresearch / Detectron

Train Error for running #975