facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.03k stars 7.42k forks source link

Detectron2 ignores (?) training dataset #4374

Closed EtagiBI closed 1 year ago

EtagiBI commented 2 years ago

In deep machine learning we have three standard types of datasets:

I'd like to train a Detectron2 model, so I need a training dataset and a validation dataset. My training dataset has 1928 images in it, whereas my validation dataset has 304 images in it.

According to docs, in Detectron2 training dataset is called "train" and validation dataset is called "test". Alright then.

Instructions To Reproduce the Issue:

Here's my code:

CLASSES = ["one", "two", "three", "four", "five"]
TRAIN_IMAGES_PATH = "*Path_to_datasets*/1_train/"
VALID_IMAGES_PATH = "*Path_to_datasets*/2_valid/"
TRAIN_COCO_PATH = "*Path_to_COCOs*/1_train/"
VALID_COCO_PATH = "*Path_to_COCOs*/2_valid/"

mapper = {"train": {"images_dir": TRAIN_IMAGES_PATH, "COCO_dir": TRAIN_COCO_PATH, "COCO_file": "1_train_COCO.json"},
         "valid": {"images_dir": VALID_IMAGES_PATH, "COCO_dir": VALID_COCO_PATH, "COCO_file": "2_valid_COCO.json"},
}

for dataset in ["train", "valid"]:
    images_path = mapper[dataset]["images_dir"]
    COCO_path = mapper[dataset]["COCO_dir"]
    COCO_file = mapper[dataset]["COCO_file"]
    DatasetCatalog.register("objects_detect_" + dataset, lambda dataset=dataset: get_board_dicts(images_path, COCO_path, COCO_file))
    MetadataCatalog.get("objects_detect_" + dataset).set(thing_classes=CLASSES)

if __name__ == "__main__":
    cfg = get_cfg()
    cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/retinanet_R_50_FPN_3x.yaml"))
    cfg.OUTPUT_DIR = "*Path_to_output*"
    cfg.DATASETS.TRAIN = ("objects_detect_train",) # I put my training dataset here
    cfg.DATASETS.TEST = ("objects_detect_valid",) # I put my validation dataset here
    cfg.DATALOADER.NUM_WORKERS = 1
    cfg.SOLVER.IMS_PER_BATCH = 8
    cfg.SOLVER.BASE_LR = 0.00005 
    cfg.SOLVER.MAX_ITER = 3500 
    cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 512
    cfg.MODEL.RETINANET.NUM_CLASSES = len(CLASSES)
    cfg.TEST.EVAL_PERIOD = 500
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/retinanet_R_50_FPN_3x.yaml")
    # Training
    trainer = DefaultTrainer(cfg)
    trainer.resume_or_load(resume=False)
    trainer.train()

When I start training, the following output appears:

[07/01 16:01:45 d2.data.build]: Removed 0 images with no usable annotations. 304 images left.
[07/01 16:01:45 d2.data.build]: Distribution of instances among all 5 categories:
|  category  | #instances   |   category    | #instances   |  category  | #instances   |
|:----------:|:-------------|:-------------:|:-------------|:----------:|:-------------|
|    one       | 304            | three           | 392             |    five       | 288             |
|   two        | 408            | four             | 104             |                |                    |
|   total       | 1496          |                    |                    |                |                   |

[07/01 16:01:45 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[07/01 16:01:45 d2.data.build]: Using training sampler TrainingSampler
[07/01 16:01:45 d2.data.common]: Serializing 304 elements to byte tensors and concatenating them all ...
[07/01 16:01:45 d2.data.common]: Serialized dataset takes 0.16 MiB

According to this output, the training process relies on 304 images, but that's my validation dataset, not training dataset.

Every 500 iterations I get the following output:

[07/01 16:09:47 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[07/01 16:09:47 d2.data.common]: Serializing 304 elements to byte tensors and concatenating them all ...
[07/01 16:09:47 d2.data.common]: Serialized dataset takes 0.16 MiB
WARNING [07/01 16:09:47 d2.evaluation.coco_evaluation]: COCO Evaluator instantiated using config, this is deprecated behavior. Please pass in explicit arguments instead.
[07/01 16:09:47 d2.evaluation.coco_evaluation]: Trying to convert 'objects_detect_valid' to COCO format ...
[07/01 16:09:47 d2.data.datasets.coco]: Converting annotations of dataset 'objects_detect_valid' to COCO format ...)
[07/01 16:09:47 d2.data.datasets.coco]: Converting dataset dicts into COCO format
[07/01 16:09:47 d2.data.datasets.coco]: Conversion finished, #images: 304, #annotations: 1496
[07/01 16:09:47 d2.evaluation.evaluator]: Start inference on 304 batches
<etc.>

Expected behavior:

The output refers to 304 images, so I assume that validation set is used for evaluation. I expect to see any lines mentioning 1928 images and corresponding annotations distribution, but logs search shows no matches with '1928'. It seems that cfg.DATASETS.TRAIN setting is completely ignored by Detectron2, so my training dataset isn't taken into account at all.

Why does Detectron2 behave that way?

Environment:


sys.platform win32 Python 3.7.11 (default, Jul 27 2021, 09:42:29) [MSC v.1916 64 bit (AMD64)] numpy 1.21.2 detectron2 0.5 @c:\users\e-soft\detectron2\detectron2 Compiler MSVC 192930133 CUDA compiler CUDA 11.2 detectron2 arch flags c:\users\e-soft\detectron2\detectron2_C.cp37-win_amd64.pyd; cannot find cuobjdump DETECTRON2_ENV_MODULE PyTorch 1.8.2 @C:\Users\E-soft\Anaconda3\envs\Detectium\lib\site-packages\torch PyTorch debug build False GPU available Yes GPU 0 NVIDIA GeForce GTX 1080 Ti (arch=6.1) Driver version CUDA_HOME C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1 Pillow 8.4.0 torchvision 0.9.2 @C:\Users\E-soft\Anaconda3\envs\Detectium\lib\site-packages\torchvision torchvision arch flags C:\Users\E-soft\Anaconda3\envs\Detectium\lib\site-packages\torchvision_C.pyd; cannot find cuobjdump fvcore 0.1.5.post20210924 iopath 0.1.9 cv2 4.5.3


PyTorch built with:

github-actions[bot] commented 2 years ago

You've chosen to report an unexpected problem or bug. Unless you already know the root cause of it, please include details about it by filling the issue template. The following information is missing: "Instructions To Reproduce the Issue and Full Logs"; "Your Environment";

ppwwyyxx commented 1 year ago

Your code has this bug: https://stackoverflow.com/questions/25314547/cell-var-from-loop-warning-from-pylint Closing.