Divadi commented 4 years ago

Instructions To Reproduce the Issue:

what changes you made (git diff) or what code you wrote I'm running COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml on a custom dataset (KITTI data). Training on three object types, and set the other 6 object types to "other" for 4 total classes.

When I trained on all 9 classes without changing 6 of them to "other", I don't get an error.

what exact command you run: Training code:


from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg

cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")) cfg.DATASETS.TRAIN = ("KITTI_train",) cfg.DATASETS.TEST = () cfg.DATALOADER.NUM_WORKERS = 2 cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml") # Let training initialize from model zoo cfg.SOLVER.IMS_PER_BATCH = 2 cfg.SOLVER.BASE_LR = 0.005 cfg.SOLVER.MAX_ITER = 300 # 300 iterations seems good enough for this toy dataset; you may need to train longer for a practical dataset cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 # faster, and good enough for this toy dataset (default: 512) cfg.MODEL.ROI_HEADS.NUM_CLASSES = 4 #

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False) trainer.train()


Evaluation code:

from detectron2.evaluation import COCOEvaluator, inference_on_dataset from detectron2.data import build_detection_test_loader evaluator = COCOEvaluator("KITTI_train", cfg, False, output_dir="./output/") val_loader = build_detection_test_loader(cfg, "KITTI_train") inference_on_dataset(trainer.model, val_loader, evaluator)

another equivalent way is to use trainer.test

3. what you observed (including the full logs):

AssertionError Traceback (most recent call last)

in () 4 evaluator = COCOEvaluator("KITTI_train", cfg, False, output_dir="./output/") 5 val_loader = build_detection_test_loader(cfg, "KITTI_train") ----> 6 inference_on_dataset(trainer.model, val_loader, evaluator) 7 # another equivalent way is to use trainer.test 3 frames /usr/local/lib/python3.6/dist-packages/detectron2/evaluation/evaluator.py in inference_on_dataset(model, data_loader, evaluator) 157 ) 158 --> 159 results = evaluator.evaluate() 160 # An evaluator may return None when not in main process. 161 # Replace it by an empty dict instead to make it easier for downstream code to handle /usr/local/lib/python3.6/dist-packages/detectron2/evaluation/coco_evaluation.py in evaluate(self) 136 self._eval_box_proposals() 137 if "instances" in self._predictions[0]: --> 138 self._eval_predictions(set(self._tasks)) 139 # Copy so the caller can do whatever with results 140 return copy.deepcopy(self._results) /usr/local/lib/python3.6/dist-packages/detectron2/evaluation/coco_evaluation.py in _eval_predictions(self, tasks) 184 185 res = self._derive_coco_results( --> 186 coco_eval, task, class_names=self._metadata.get("thing_classes") 187 ) 188 self._results[task] = res /usr/local/lib/python3.6/dist-packages/detectron2/evaluation/coco_evaluation.py in _derive_coco_results(self, coco_eval, iou_type, class_names) 270 precisions = coco_eval.eval["precision"] 271 # precision has dims (iou, recall, cls, area range, max dets) --> 272 assert len(class_names) == precisions.shape[2] 273 274 results_per_category = [] AssertionError: ``` ## Environment: ``` ------------------------ --------------------------------------------------------------- sys.platform linux Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0] numpy 1.17.5 detectron2 0.1 @/usr/local/lib/python3.6/dist-packages/detectron2 detectron2 compiler GCC 7.3 detectron2 CUDA compiler 10.0 detectron2 arch flags sm_35, sm_37, sm_50, sm_52, sm_60, sm_61, sm_70, sm_75 DETECTRON2_ENV_MODULE PyTorch 1.4.0+cu100 @/usr/local/lib/python3.6/dist-packages/torch PyTorch debug build False CUDA available True GPU 0 Tesla P4 CUDA_HOME /usr/local/cuda NVCC Cuda compilation tools, release 10.0, V10.0.130 Pillow 6.2.2 torchvision 0.5.0+cu100 @/usr/local/lib/python3.6/dist-packages/torchvision torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75 cv2 4.1.2 ------------------------ --------------------------------------------------------------- PyTorch built with: - GCC 7.3 - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc) - OpenMP 201511 (a.k.a. OpenMP 4.5) - NNPACK is enabled - CUDA Runtime 10.0 - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37 - CuDNN 7.6.3 - Magma 2.5.1 - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, ```

ppwwyyxx commented 4 years ago

There is not enough code. How did you register your dataset? If you're not using a coco-format dataset, you'll need to do this https://detectron2.readthedocs.io/tutorials/datasets.html#metadata-for-datasets to declare the class names.

Divadi commented 4 years ago

Thank you for your response. I finally managed to get around this issue. It had cached everything into KITTI_train_coco_format.json in the outputs folder, and this wasn't being updated when I changed CATEGORIES in

MetadataCatalog.get("KITTI_" + d).set(thing_classes=CATEGORIES)

Is there a way to clear metadata (and the cached coco jsons)? I've been restarting my runtime instance to clear metadata.

ppwwyyxx commented 4 years ago

That's indeed an issue we know but kind of live with it (discussed in https://github.com/facebookresearch/detectron2/pull/175#discussion_r340381085). cc @botcs maybe this can be improved, e.g. by a quick hash over all the dicts?

You can clear the cache by removing the output directory. In fact, if you use the same directory it may also resume some other artifacts (e.g. append metrics to a previous run), so it's good to clean it.

Divadi commented 4 years ago

Ah, I see. I'll do that for now then - thank you!

botcs commented 4 years ago

That's indeed an issue we know but kind of live with it (discussed in #175 (comment)). cc @botcs maybe this can be improved, e.g. by a quick hash over all the dicts?

You can clear the cache by removing the output directory. In fact, if you use the same directory it may also resume some other artifacts (e.g. append metrics to a previous run), so it's good to clean it.

Hmmm... what do you mean by a quick hash over all the dicts? A MetaData hash should be sufficient, otherwise the whole dataset_dicts will be loaded just for verification, and it can heavily vary based on the dataset.

The main motive behind the hashing was to speed up the development by reducing the time that data fetching and conversion took. There was a warning thrown previously when fetching cached data, because yeah... it's not validated, but in return for the risk you don't have to wait e.g. 10 minute to just see what's going on in the extra layer you have added to the network.

One safe solution is to just pass False for the allow_cached here argument while developing. Maybe that should be the default?

ppwwyyxx commented 4 years ago

I agree that hashing the data is probably too expensive. I think at least https://github.com/facebookresearch/detectron2/blob/5d452a0ec0b8bc21e61c6cb4cfef34ef7fefdb90/detectron2/data/datasets/coco.py#L413 can be changed to a warning and instruct users that they should clean the cache if dataset is modified.

I'm slightly in favor of the safe option given that users are running into issues.

grmsljj commented 3 years ago

Thank you for your response. I finally managed to get around this issue. It had cached everything into KITTI_train_coco_format.json in the outputs folder, and this wasn't being updated when I changed CATEGORIES in
MetadataCatalog.get("KITTI_" + d).set(thing_classes=CATEGORIES)
Is there a way to clear metadata (and the cached coco jsons)? I've been restarting my runtime instance to clear metadata.

did you solve this? is there anything to do with cache?

pranay-ar commented 3 years ago

Thank you for your response. I finally managed to get around this issue. It had cached everything into KITTI_train_coco_format.json in the outputs folder, and this wasn't being updated when I changed CATEGORIES in
MetadataCatalog.get("KITTI_" + d).set(thing_classes=CATEGORIES)
Is there a way to clear metadata (and the cached coco jsons)? I've been restarting my runtime instance to clear metadata.
did you solve this? is there anything to do with cache?

Hi @grmsljj, were you able to fix this? I am facing the same error too.

crsegerie commented 2 years ago

Facing the same error

pranay-ar commented 2 years ago

@crsegerie Have you looked at #120. I was facing the same error too and was able to solve it this way

jannehlamin commented 3 months ago

@ppwwyyxx , I need your urgent intervention here, I am completely confuse here. assert len(class_names) == precisions.shape[2] I have followed all the recommendation here but nothing works. I try to view the output of the coco_eval.eval and is: {'params': <pycocotools.cocoeval.Params object at 0x7f1067d6f040>, 'counts': [10, 101, 0, 4, 5], 'date': '2024-06-11 06:13:34', 'precision': array([], shape=(10, 101, 0, 4, 5), dtype=float64), 'recall': array([], shape=(10, 0, 4, 5), dtype=float64), 'scores': array([], shape=(10, 101, 0, 4, 5), dtype=float64), 'ok_det_as_known': array([], shape=(10, 0, 4, 5), dtype=float64), 'unk_det_as_known': array([], shape=(10, 0, 4, 5), dtype=float64), 'tp_plus_fp_cs': array([], shape=(10, 101, 0, 4, 5), dtype=float64), 'fp_os': array([], shape=(10, 101, 0, 4, 5), dtype=float64)}

which indicate empty list for the precision, isn't that the model doesn't predict the classes?

jannehlamin commented 3 months ago

After strenuous effort, I was able to resolve the problem. so for the benefit of others. I was using custom dataset which has a subtle annotation difference with the standard coco annotation. Obviously, these variations will have an impact in the coco evaluation functions. cat_ids = sorted(coco_gt.getCatIds(catNms=known_names)) ,

As for my case , accessing the the cat_ids by categories name failed because the category "name" in my json file is "name":"None", as indicated in the screenshot below:

Solution: I provided keys for cat_Ids instead of using the cat_Ids queries function from the coco api ` cat_ids = sorted(known_ids)

cat_ids = sorted(coco_gt.getCatIds(catNms=known_names))

coco_eval.params.catIds = cat_ids`

facebookresearch / detectron2

AssertionError: assert len(class_names) == precisions.shape[2] #877

Instructions To Reproduce the Issue:

another equivalent way is to use trainer.test

cat_ids = sorted(coco_gt.getCatIds(catNms=known_names))