Training with yolov8m - Githubissues

ubada00 commented 1 month ago

Hello! Thank you for your excellent work. I have a question about an issue I encountered when training a model with YOLOv8m instead of Faster R-CNN. For example, when I enter this training command: CUDA_VISIBLE_DEVICES=0 python tools/relation_train_net.py --task predcls --save-best --config-file "configs/IndoorVG/e2e_relation_yolov8.yaml" MODEL.ROI_RELATION_HEAD.PREDICTOR MotifPredictor SOLVER.IMS_PER_BATCH 12 TEST.IMS_PER_BATCH 1 DTYPE "float16" SOLVER.MAX_EPOCH 20 MODEL.PRETRAINED_DETECTOR_CKPT ./checkpoints/yolov8m_indoorvg.pt OUTPUT_DIR ./checkpoints/Motif-precls-exmp TEST.INFORMATIVE False

The error message was

2024-07-26 14:37:05.490 | INFO     | sgg_benchmark.utils.logger:setup_logger:31 - Using loguru logger with level: INFO
2024-07-26 14:37:05.490 | INFO     | __main__:main:444 - Using 1 GPUs
2024-07-26 14:37:05.490 | INFO     | sgg_benchmark.utils.logger:logger_step:15 - #################### Step 1: Collecting environment info... ####################
2024-07-26 14:37:06.595 | INFO     | __main__:main:457 - Saving config into: ./checkpoints/Motif-precls-exmp/config.yml
2024-07-26 14:37:06.607 | INFO     | sgg_benchmark.utils.logger:logger_step:15 - #################### Step 2: Building model... ####################
Overriding model.yaml nc=80 with nc=84
2024-07-26 14:37:06.857 | INFO     | sgg_benchmark.data.build:get_dataset_statistics:30 - ----------------------------------------------------------------------------------------------------
2024-07-26 14:37:06.857 | INFO     | sgg_benchmark.data.build:get_dataset_statistics:31 - get dataset statistics...
2024-07-26 14:37:06.857 | INFO     | sgg_benchmark.data.build:get_dataset_statistics:46 - Unable to load data statistics from: ./checkpoints/Motif-precls-exmp/VG_indoor_filtered_train_statistics.cache
2024-07-26 14:37:08.508 | INFO     | sgg_benchmark.data.build:get_dataset_statistics:71 - Save data statistics to: ./checkpoints/Motif-precls-exmp/VG_indoor_filtered_train_statistics.cache
2024-07-26 14:37:08.509 | INFO     | sgg_benchmark.data.build:get_dataset_statistics:72 - ----------------------------------------------------------------------------------------------------
2024-07-26 14:37:12.601 | INFO     | sgg_benchmark.data.build:get_dataset_statistics:30 - ----------------------------------------------------------------------------------------------------
2024-07-26 14:37:12.601 | INFO     | sgg_benchmark.data.build:get_dataset_statistics:31 - get dataset statistics...
2024-07-26 14:37:12.601 | INFO     | sgg_benchmark.data.build:get_dataset_statistics:42 - Loading data statistics from: ./checkpoints/Motif-precls-exmp/VG_indoor_filtered_train_statistics.cache
2024-07-26 14:37:12.602 | INFO     | sgg_benchmark.data.build:get_dataset_statistics:43 - ----------------------------------------------------------------------------------------------------
2024-07-26 14:37:15.058 | INFO     | sgg_benchmark.utils.logger:logger_step:15 - #################### Step 3: Building optimizer and scheduler... ####################
Transferred 475/475 items from pretrained weights
2024-07-26 14:37:15.201 | INFO     | sgg_benchmark.utils.logger:logger_step:15 - #################### Step 4: Loading Backbone weights from ./checkpoints/yolov8m_indoorvg.pt ####################
2024-07-26 14:37:15.201 | INFO     | sgg_benchmark.utils.logger:logger_step:15 - #################### Step 5: Building checkpointer ####################
2024-07-26 14:37:15.524 | INFO     | sgg_benchmark.utils.miscellaneous:save_labels:50 - Saving labels mapping into ./checkpoints/Motif-precls-exmp/labels.json
2024-07-26 14:37:15.772 | INFO     | sgg_benchmark.utils.logger:logger_step:15 - #################### Step 6: Building dataloader ####################
2024-07-26 14:37:15.772 | INFO     | __main__:train:203 - Validate before training
2024-07-26 14:37:15.773 | INFO     | sgg_benchmark.engine.inference:inference:265 - Start evaluation on VG_indoor_filtered_val dataset(733 images).
  0%|                                                                                                                                          | 0/733 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/uhcc/Desktop/SGG-Benchmark/tools/relation_train_net.py", line 491, in <module>
    main()
  File "/home/uhcc/Desktop/SGG-Benchmark/tools/relation_train_net.py", line 470, in main
    model, best_checkpoint = train(
                             ^^^^^^
  File "/home/uhcc/Desktop/SGG-Benchmark/tools/relation_train_net.py", line 204, in train
    run_val(cfg, model, val_data_loaders, args['distributed'], logger, device=device)
  File "/home/uhcc/Desktop/SGG-Benchmark/tools/relation_train_net.py", line 323, in run_val
    dataset_result = inference(
                     ^^^^^^^^^^
  File "/home/uhcc/Desktop/SGG-Benchmark/sgg_benchmark/engine/inference.py", line 278, in inference
    predictions, timings = compute_on_dataset(model, data_loader, device, synchronize_gather=cfg.TEST.RELATION.SYNC_GATHER, timer=inference_timer, silence=silence)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhcc/Desktop/SGG-Benchmark/sgg_benchmark/engine/inference.py", line 42, in compute_on_dataset
    output = model(images.to(device), targets)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhcc/anaconda3/envs/sgg/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhcc/anaconda3/envs/sgg/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhcc/Desktop/SGG-Benchmark/sgg_benchmark/modeling/detector/generalized_yolo.py", line 82, in forward
    x, result, detector_losses = self.roi_heads(features, proposals, targets, logger, targets)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhcc/anaconda3/envs/sgg/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhcc/anaconda3/envs/sgg/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhcc/Desktop/SGG-Benchmark/sgg_benchmark/modeling/roi_heads/roi_heads.py", line 53, in forward
    x, detections, loss_relation = self.relation(features, detections, targets, logger)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhcc/anaconda3/envs/sgg/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhcc/anaconda3/envs/sgg/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhcc/Desktop/SGG-Benchmark/sgg_benchmark/modeling/roi_heads/relation_head/relation_head.py", line 100, in forward
    _, relation_logits, add_losses = self.predictor(proposals, rel_pair_idxs, rel_labels, rel_binarys, roi_features, union_features, logger)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhcc/anaconda3/envs/sgg/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhcc/anaconda3/envs/sgg/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/uhcc/Desktop/SGG-Benchmark/sgg_benchmark/modeling/roi_heads/relation_head/predictors/default_predictors.py", line 259, in forward
    obj_preds, edge_ctx = self.context_layer(roi_features, proposals, logger)
    ^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)

I'm encountering the same error message on VG150, IndoorVG, and PSG datasets when using the pretrained yolov8m not faster_rcnn. I haven't changed anything in the e2e_relation_yolov8.yaml file. How can I resolve this issue?

Maelic commented 1 month ago

Hi @ubada00,

Thanks for pointing out this issue, I have solved it in the latest release please update the codebase (git pull) and re-install it (pip install .) and it should work. I also updated the config file configs/IndoorVG/e2e_relation_yolov8.yaml which should work better now.

Best

ubada00 commented 1 month ago

Tank you for your quick reply!!

I get this error message like other people

  File "/home/ubada00/Desktop/SGG-Benchmark/sgg_benchmark/data/datasets/visual_genome.py", line 241, in get_groundtruth
    target.add_field("informative_rels", self.informative_graphs[str(img_info['image_id'])])
                                         ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '2351750'

My solution was to change line 241 'str' into 'int'

target.add_field("informative_rels", self.informative_graphs[int(img_info['image_id'])])

By doing it I could train the model and results was like this.

2024-07-29 14:58:33.104 | INFO     | __main__:train:299 - Best Epoch is : 8.0000
2024-07-29 14:58:33.115 | INFO     | __main__:main:480 - Loading best checkpoint from ./checkpoints/Motif-precls-exmp/best_model_epoch_8.pth...
2024-07-29 14:58:33.115 | INFO     | sgg_benchmark.utils.checkpoint:load:65 - Loading checkpoint from ./checkpoints/Motif-precls-exmp/best_model_epoch_8.pth
2024-07-29 14:58:33.699 | INFO     | sgg_benchmark.engine.inference:inference:265 - Start evaluation on VG_indoor_filtered_test dataset(4403 images).
2024-07-29 14:58:33.867 | INFO     | sgg_benchmark.engine.inference:inference:276 - Loaded predictions from cache in ./checkpoints/Motif-precls-exmp/inference/VG_indoor_filtered_test/predictions.pth
creating index...
index created!
Loading and preparing results...
Converting ndarray to lists...
(14212, 7)
0/14212
DONE (t=0.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=1.52s).
Accumulating evaluation results...
DONE (t=0.33s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.964
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.887
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.970
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.668
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.974
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.976
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.891
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.981
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 1.000
2024-07-29 14:58:37.120 | INFO     | sgg_benchmark.data.datasets.evaluation.vg.vg_eval:do_vg_evaluation:172 - 
====================================================================================================
Detection evaluation mAp=0.9998
====================================================================================================
SGG eval:     R @ 20: 0.3621;     R @ 50: 0.4588;     R @ 100: 0.5012;  for mode=predcls, type=Recall (Main).
SGG eval:    mR @ 20: 0.1696;    mR @ 50: 0.2304;    mR @ 100: 0.2512;  for mode=predcls, type=Mean Recall.
----------------------- Details ------------------------
(above:0.4243) (against:0.5612) (at:0.5003) (attached to:0.0119) (behind:0.4675) (between:0.0000) (carrying:0.0000) (covering:0.2917) (cutting:0.0000) (drinking:0.0000) (eating:0.0000) (filled with:0.3333) (for:0.0000) (hanging from:0.6011) (has:0.5008) (holding:0.5845) (in:0.5512) (in front of:0.1821) (laying on:0.0000) (looking at:0.0909) (lying on:0.0000) (mounted on:0.1875) (near:0.2294) (of:0.2798) (on:0.5580) (playing with:0.0000) (reading:0.0000) (sitting at:0.2128) (sitting on:0.3106) (standing on:0.0000) (taking:0.0000) (talking on:0.0000) (under:0.4487) (using:0.6905) (watching:0.0000) (wearing:0.9078) (with:0.3674) 
--------------------------------------------------------
SGG eval:     F1 @ 20: 0.2310;     F1 @ 50: 0.3067;     F1 @ 100: 0.3346;  for mode=predcls, type=F1.
====================================================================================================

2024-07-29 14:58:37.165 | INFO     | __main__:main:487 - #################### END TRAINING ####################

I think this is the poor results... Maybe I need to do some hyperparameter tuning? (I didn't change e2e_relation_yolov8.yaml after re-installed)

Maelic commented 1 month ago

Hi, I don't think that this is a bad result for the Motifs model and IndoorVG dataset, Motifs is an old model at this point and will have limited performance (you can use methods such as TDE with the CausalAnalysisPredictor if you want to further boost the performance of Motifs). I would suggest you try another model such as PENET or SQUAT which are way better than Motifs.

ubada00 commented 1 month ago

Thank you! I will try with other model.

I just wonder SGG evaluation of my results on task predcls with (Detector = yolovm8 , Predictors = Motifs) is usual?

SGG eval:     R @ 20: 0.3621;     R @ 50: 0.4588;     R @ 100: 0.5012;  for mode=predcls, type=Recall (Main).
SGG eval:    mR @ 20: 0.1696;    mR @ 50: 0.2304;    mR @ 100: 0.2512;  for mode=predcls, type=Mean Recall.result

When I compare the results of 'Model.zoo(in this github)' and 'Unbiased Scene Graph Generation from Biased Training(in the official 2D SGG paper),' the values are too low.

If the model is trained well, the values of R@K and mR@K should be greater than 10, not in decimal form, right?

Maelic commented 1 month ago

Yes, your values are normal, when we report metrics in MODEL_ZOO and in papers we usually do it in %, so you need to apply a x100 to your results if you want to compare (the metric plotting here will always be between 0 and 1). But results for the task of PredCLS in MODEL_ZOO are reported only for Motifs-TDE (which is a different model than Motifs that you are using) and for another dataset (the VG150 dataset and from your config file it seems you are training on the IndoorVG dataset). If you want to train a model with YoloV8 in the PredCLS task to compare with the results in MODEL_ZOO you can use a command like this: CUDA_VISIBLE_DEVICES=0 python tools/relation_train_net.py --task predcls --save-best --config-file "configs/VG150/e2e_relation_yolov8m.yaml" MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE TDE MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs SOLVER.IMS_PER_BATCH 8 TEST.IMS_PER_BATCH 1 DTYPE "float16" SOLVER.MAX_EPOCH 20 TEST.INFORMATIVE False MODEL.PRETRAINED_DETECTOR_CKPT ./checkpoints/VG150/backbone/yolov8m_vg150.pt OUTPUT_DIR ./checkpoints/VG150/causal-motifs-sgdet-exmp

This will train a Causal-type model with a Motifs context and TDE effect enabled, which in fact are the settings used in the paper Unbiased Scene Graph Generation from Biased Training and reported in MODEL_ZOO.

ubada00 commented 1 month ago

Thank you so much for your help!! By the way, It was very stupid confusion... LOL

Maelic / SGG-Benchmark

Training with yolov8m #19