Please read & provide the following

skylark-joe commented 7 months ago

hi, i trian the model for 3000 ite, only to find the EVALU result to be zero. During the training time, there is no error, expcet a waring saying that Category ids in annotations are not in [1, #categories]! We'll apply a mapping for you. does it have something to do with the argu config.eval_only? i do not set the argument, so it should take the default any available hep would be appreciated.

Instructions To Reproduce the Issue:

Full runnable code or full changes you made: i follow the turiroal , loading my datasets using register_coco_json() , then i start to train. here is my .yaml file, the Base-RCNN-FPN.yaml is the original one in config directory

BASE: "../Base-RCNN-FPN.yaml" MODEL: WEIGHTS: "" MASK_ON: True RESNETS: DEPTH: 50 ROI_HEADS: NUM_CLASSES: 6 DATASETS: TRAIN: ("steel_train",) #("coco_2017_train",) TEST: ("steel_val",) #("coco_2017_val",) DATALOADER: NUM_WORKERS: 8 SOLVER: STEPS: () #(210000, 250000) MAX_ITER: 270000 IMS_PER_BATCH: 16 BASE_LR: 0.001 #0.02 MAX_ITER: 90000

in command line, i run python plain_train_net.py --config-file ../configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml and it run successfully as it seems

Full logs or other relevant observations: [04/19 04:37:58] d2.data.datasets.coco INFO: Loaded 360 images in COCO format from ../datasets/steel/annotations/steel_val.json [04/19 04:37:58] d2.data.build INFO: Distribution of instances among all 6 categories: [36m	category	#instances	category	#instances	category	#instances
crazing	154	inclusion	195	patches	170
pitted_surf..	82	rolled-in_s..	132	scratches	102

total	835					[0m

[04/19 04:37:58] d2.data.dataset_mapper INFO: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')] [04/19 04:37:58] d2.data.common INFO: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'> [04/19 04:37:58] d2.data.common INFO: Serializing 360 elements to byte tensors and concatenating them all ... [04/19 04:37:58] d2.data.common INFO: Serialized dataset takes 0.19 MiB [04/19 04:37:58] d2.evaluation.evaluator INFO: Start inference on 360 batches [04/19 04:38:00] d2.evaluation.evaluator INFO: Inference done 11/360. Dataloading: 0.0006 s/iter. Inference: 0.0690 s/iter. Eval: 0.0027 s/iter. Total: 0.0723 s/iter. ETA=0:00:25 [04/19 04:38:05] d2.evaluation.evaluator INFO: Inference done 80/360. Dataloading: 0.0011 s/iter. Inference: 0.0684 s/iter. Eval: 0.0030 s/iter. Total: 0.0725 s/iter. ETA=0:00:20 [04/19 04:38:10] d2.evaluation.evaluator INFO: Inference done 151/360. Dataloading: 0.0012 s/iter. Inference: 0.0676 s/iter. Eval: 0.0028 s/iter. Total: 0.0717 s/iter. ETA=0:00:14 [04/19 04:38:15] d2.evaluation.evaluator INFO: Inference done 221/360. Dataloading: 0.0012 s/iter. Inference: 0.0679 s/iter. Eval: 0.0028 s/iter. Total: 0.0719 s/iter. ETA=0:00:09 [04/19 04:38:20] d2.evaluation.evaluator INFO: Inference done 290/360. Dataloading: 0.0012 s/iter. Inference: 0.0681 s/iter. Eval: 0.0028 s/iter. Total: 0.0721 s/iter. ETA=0:00:05 [04/19 04:38:25] d2.evaluation.evaluator INFO: Total inference time: 0:00:25.371791 (0.071470 s / iter per device, on 1 devices) [04/19 04:38:25] d2.evaluation.evaluator INFO: Total inference pure compute time: 0:00:23 (0.067332 s / iter per device, on 1 devices) [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Preparing results for COCO format ... [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Saving results to ./output/inference/steel_val/coco_instances_results.json [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Evaluating predictions with unofficial COCO API... [04/19 04:38:25] d2.evaluation.fast_eval_api INFO: Evaluate annotation type bbox [04/19 04:38:25] d2.evaluation.fast_eval_api INFO: COCOeval_opt.evaluate() finished in 0.04 seconds. [04/19 04:38:25] d2.evaluation.fast_eval_api INFO: Accumulating evaluation results... [04/19 04:38:25] d2.evaluation.fast_eval_api INFO: COCOeval_opt.accumulate() finished in 0.02 seconds. [04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox:	AP	AP50	AP75	APs	APm	APl
0.000	0.000	0.000	0.000	0.000	0.000

[04/19 04:38:25] d2.evaluation.coco_evaluation INFO: Per-category bbox AP:	category	AP	category	AP	category	AP
crazing	0.000	inclusion	0.000	patches	0.000
pitted_surface	0.000	rolled-in_scale	0.000	scratches	0.000

[04/19 04:38:26] d2.evaluation.fast_eval_api INFO: Evaluate annotation type segm [04/19 04:38:26] d2.evaluation.fast_eval_api INFO: COCOeval_opt.evaluate() finished in 0.13 seconds. [04/19 04:38:26] d2.evaluation.fast_eval_api INFO: Accumulating evaluation results... [04/19 04:38:26] d2.evaluation.fast_eval_api INFO: COCOeval_opt.accumulate() finished in 0.02 seconds. [04/19 04:38:26] d2.evaluation.coco_evaluation INFO: Evaluation results for segm:	AP	AP50	AP75	APs	APm	APl
0.000	0.000	0.000	0.000	0.000	0.000

[04/19 04:38:26] d2.evaluation.coco_evaluation INFO: Per-category segm AP:	category	AP	category	AP	category	AP
crazing	0.000	inclusion	0.000	patches	0.000
pitted_surface	0.000	rolled-in_scale	0.000	scratches	0.000

[04/19 04:38:26] detectron2 INFO: Evaluation results for steel_val in csv format: [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: Task: bbox [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000 [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: Task: segm [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl [04/19 04:38:26] d2.evaluation.testing INFO: copypaste: 0.0000,0.0000,0.0000,0.0000,0.0000,0.0000

Expected behavior:

at least, there should be a result but zero, i don not know what cause the problem

Environment:

the environment is set up following the tutorial


-------------------------------  -----------------------------------------------------------------------------------
sys.platform                     linux
Python                           3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0]
numpy                            1.23.5
detectron2                       0.6 @/home/detectron2/detectron2
Compiler                         GCC 9.4
CUDA compiler                    CUDA 11.6
detectron2 arch flags            7.5
DETECTRON2_ENV_MODULE            <not set>
PyTorch                          1.12.1+cu116 @/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torch
PyTorch debug build              False
torch._C._GLIBCXX_USE_CXX11_ABI  False
GPU available                    Yes
GPU 0                            NVIDIA GeForce RTX 2080 Ti (arch=7.5)
Driver version                   510.54
CUDA_HOME                        /usr/local/cuda
Pillow                           9.3.0
torchvision                      0.13.1+cu116 @/root/miniconda3/envs/myconda/lib/python3.8/site-packages/torchvision
torchvision arch flags           3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore                           0.1.5.post20221221
iopath                           0.1.9
cv2                              4.6.0
-------------------------------  -----------------------------------------------------------------------------------

Huxwell commented 7 months ago

It's hard to tell why your model is not learning based on the limited info you provided. A good sanity check would be using the same train set and validation set during your first training run. You can even use 2-10 images instead of 360.

http://karpathy.github.io/2019/04/25/recipe/ 'overfit one batch. Overfit a single batch of only a few examples (e.g. as little as two). To do so we increase the capacity of our model (e.g. add layers or filters) and verify that we can reach the lowest achievable loss (e.g. zero). I also like to visualize in the same plot both the label and the prediction and ensure that they end up aligning perfectly once we reach the minimum loss. If they do not, there is a bug somewhere and we cannot continue to the next stage.'

skylark-joe commented 7 months ago

thanks for your advice, it is kind of u to provide an atricle here, which i would read. in fact, accidently, i guess i have solved the problem, through changeing the category_id in my json_file to [1, categories].

the waring we got is in detectron2\detectron2\data\datasets\coco.py line 104, saying that it will apply a mapping however, when i read the following codes, i find that there is no operation to category_id as it saied.

in addition, in line 437 in the same file, i see that the "id" field must start with 1 if we want to use the COCO API, and then, after changing, it works.

facebookresearch / detectron2