facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
29.3k stars 7.32k forks source link

Please read & provide the following #5265

Open skylark-joe opened 2 months ago

skylark-joe commented 2 months ago

hi, i trian the model for 3000 ite, only to find the EVALU result to be zero. During the training time, there is no error, expcet a waring saying that Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you. does it have something to do with the argu config.eval_only? i do not set the argument, so it should take the default any available hep would be appreciated.

Instructions To Reproduce the Issue:

  1. Full runnable code or full changes you made: i follow the turiroal , loading my datasets using register_coco_json() , then i start to train. here is my .yaml file, the Base-RCNN-FPN.yaml is the original one in config directory

BASE: "../Base-RCNN-FPN.yaml" MODEL: WEIGHTS: "" MASK_ON: True RESNETS: DEPTH: 50 ROI_HEADS: NUM_CLASSES: 6 DATASETS: TRAIN: ("steel_train",) #("coco_2017_train",) TEST: ("steel_val",) #("coco_2017_val",) DATALOADER: NUM_WORKERS: 8 SOLVER: STEPS: () #(210000, 250000) MAX_ITER: 270000 IMS_PER_BATCH: 16 BASE_LR: 0.001 #0.02 MAX_ITER: 90000

  1. in command line, i run python plain_train_net.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml and it run successfully as it seems

  2. Full logs or other relevant observations: [04/19 04:37:58] d2.data.datasets.coco INFO: Loaded 360 images in COCO format from ../datasets/steel/annotations/steel_val.json [04/19 04:37:58] d2.data.build INFO: Distribution of instances among all 6 categories:  category #instances category #instances category #instances
    crazing 154 inclusion 195 patches 170
    pitted_surf.. 82 rolled-in_s.. 132 scratches 102
    total 835 
    [04/19 04:37:58] d2.data.dataset_mapper INFO: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')] [04/19 04:37:58] d2.data.common INFO: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'> [04/19 04:37:58] d2.data.common INFO: Serializing 360 elements to byte tensors and concatenating them all ... [04/19 04:37:58] d2.data.common INFO: Serialized dataset takes 0.19 MiB [04/19 04:37:58] d2.evaluation.evaluator INFO: Start inference on 360 batches [04/19 04:38:00] d2.evaluation.evaluator INFO: Inference done 11/360. Dataloading: 0.0006 s/iter. Inference: 0.0690 s/iter. Eval: 0.0027 s/iter. Total: 0.0723 s/iter. ETA=0:00:25 [04/19 04:38:05] d2.evaluation.evaluator INFO: Inference done 80/360. Dataloading: 0.0011 s/iter. Inference: 0.0684 s/iter. Eval: 0.0030 s/iter. Total: 0.0725 s/iter. ETA=0:00:20 [04/19 04:38:10] d2.evaluation.evaluator INFO: Inference done 151/360. Dataloading: 0.0012 s/iter. Inference: 0.0676 s/iter. Eval: 0.0028 s/iter. Total: 0.0717 s/iter. ETA=0:00:14 [04/19 04:38:15] d2.evaluation.evaluator INFO: Inference done 221/360. Dataloading: 0.0012 s/iter. Inference: 0.0679 s/iter. Eval: 0.0028 s/iter. Total: 0.0719 s/iter. ETA=0:00:09 [04/19 04:38:20] d2.evaluation.evaluator INFO: Inference done 290/360. Dataloading: 0.0012 s/iter. Inference: 0.0681 s/iter. Eval: 0.0028 s/iter. Total: 0.0721 s/iter. ETA=0:00:05 [04/19 04:38:25] d2.evaluation.evaluator INFO: Total inference time: 0:00:25.371791 (0.071470 s / iter per device, on 1 devices) [04/19 04:38:25] d2.evaluation.evaluator INFO: Total inference pure compute time: 0:00:23 (0.067332 s / iter per device, on 1 devices) [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Preparing results for COCO format ... [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Saving results to ./output/inference/steel_val/coco_instances_results.json [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Evaluating predictions with unofficial COCO API... [04/19 04:38:25] d2.evaluation.fast_eval_api INFO: Evaluate annotation type bbox [04/19 04:38:25] d2.evaluation.fast_eval_api INFO: COCOeval_opt.evaluate() finished in 0.04 seconds. [04/19 04:38:25] d2.evaluation.fast_eval_api INFO: Accumulating evaluation results... [04/19 04:38:25] d2.evaluation.fast_eval_api INFO: COCOeval_opt.accumulate() finished in 0.02 seconds. [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox: AP AP50 AP75 APs APm APl
    0.000 0.000 0.000 0.000 0.000 0.000
    [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Per-category bbox AP: category AP category AP category AP
    crazing 0.000 inclusion 0.000 patches 0.000
    pitted_surface 0.000 rolled-in_scale 0.000 scratches 0.000
    [04/19 04:38:26] d2.evaluation.fast_eval_api INFO: Evaluate annotation type segm [04/19 04:38:26] d2.evaluation.fast_eval_api INFO: COCOeval_opt.evaluate() finished in 0.13 seconds. [04/19 04:38:26] d2.evaluation.fast_eval_api INFO: Accumulating evaluation results... [04/19 04:38:26] d2.evaluation.fast_eval_api INFO: COCOeval_opt.accumulate() finished in 0.02 seconds. [04/19 04:38:26] d2.evaluation.coco_evaluation INFO: Evaluation results for segm: AP AP50 AP75 APs APm APl
    0.000 0.000 0.000 0.000 0.000 0.000
    [04/19 04:38:26] d2.evaluation.coco_evaluation INFO: Per-category segm AP: category AP category AP category AP
    crazing 0.000 inclusion 0.000 patches 0.000
    pitted_surface 0.000 rolled-in_scale 0.000 scratches 0.000

    [04/19 04:38:26] detectron2 INFO: Evaluation results for steel_val in csv format: [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: Task: bbox [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000 [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: Task: segm [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000

Expected behavior:

at least, there should be a result but zero, i don not know what cause the problem

Environment:

the environment is set up following the tutorial


-------------------------------  -----------------------------------------------------------------------------------
sys.platform                     linux
Python                           3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0]
numpy                            1.23.5
detectron2                       0.6 @/home/detectron2/detectron2
Compiler                         GCC 9.4
CUDA compiler                    CUDA 11.6
detectron2 arch flags            7.5
DETECTRON2_ENV_MODULE            <not set>
PyTorch                          1.12.1+cu116 @/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch
PyTorch debug build              False
torch._C._GLIBCXX_USE_CXX11_ABI  False
GPU available                    Yes
GPU 0                            NVIDIA GeForce RTX 2080 Ti (arch=7.5)
Driver version                   510.54
CUDA_HOME                        /usr/local/cuda
Pillow                           9.3.0
torchvision                      0.13.1+cu116 @/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torchvision
torchvision arch flags           3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore                           0.1.5.post20221221
iopath                           0.1.9
cv2                              4.6.0
-------------------------------  -----------------------------------------------------------------------------------
Huxwell commented 2 months ago

It's hard to tell why your model is not learning based on the limited info you provided. A good sanity check would be using the same train set and validation set during your first training run. You can even use 2-10 images instead of 360.

http://karpathy.github.io/2019/04/25/recipe/ 'overfit one batch. Overfit a single batch of only a few examples (e.g. as little as two). To do so we increase the capacity of our model (e.g. add layers or filters) and verify that we can reach the lowest achievable loss (e.g. zero). I also like to visualize in the same plot both the label and the prediction and ensure that they end up aligning perfectly once we reach the minimum loss. If they do not, there is a bug somewhere and we cannot continue to the next stage.'

skylark-joe commented 2 months ago

thanks for your advice, it is kind of u to provide an atricle here, which i would read. in fact, accidently, i guess i have solved the problem, through changeing the category_id in my json_file to [1, categories].

the waring we got is in detectron2\detectron2\data\datasets\coco.py line 104, saying that it will apply a mapping however, when i read the following codes, i find that there is no operation to category_id as it saied.

in addition, in line 437 in the same file, i see that the "id" field must start with 1 if we want to use the COCO API, and then, after changing, it works.