D641593 / MixNet

MIT License
60 stars 9 forks source link

training does not converge when using dense bounding boxes dataset #9

Closed tairen99 closed 8 months ago

tairen99 commented 10 months ago

Hi Zeng,

Thank you for your good work.

I was able to train the MSRA-TD500 dataset and repeat the results using your code. Wonderful work indeed!

But when I trained using multiple GPUs with my personal dataset that has more dense bounding boxes and bigger images, the training did not converge for some reason.

Can you please share some ideas?

Thank you in advance!

The configuration is attached below:

==========Options============ means: [0.485, 0.456, 0.406] stds: [0.229, 0.224, 0.225] gpu: 1 max_epoch: 400 start_epoch: 0 cuda: True output_dir: output input_size: 1376 max_annotation: 64 adj_num: 4 num_points: 20 use_hard: True load_memory: True scale: 1 grad_clip: 25 pos: False dis_threshold: 0.35 cls_threshold: 0.875 approx_factor: 0.004 know: False exp_name: TD500HUST_mid_convert resume: None num_workers: 21 mgpu: True save_dir: ./model/ vis_dir: ./vis/ log_dir: ./logs/ loss: CrossEntropyLoss pretrain: False verbose: True viz: False lr: 0.001 lr_adjust: fix stepvalues: [] weight_decay: 0.0 gamma: 0.1 momentum: 0.9 batch_size: 4 optim: Adam save_freq: 1 display_freq: 10 viz_freq: 50 log_freq: 10000 val_freq: 1000 net: FSNet_M mid: False embed: False onlybackbone: False rescale: 255.0 test_size: [640, 960] checkepoch: 1070 img_root: None device: cuda =============End============= MixNet backbone parameter size: 29339968 load pretrain weight from /app/MixNet/pretrained_models/pre_trained_FSNet_M/triHRnet_Synth_weight.pth. Start training MixNet. Epoch: 0 : LR = [0.001] :

D641593 commented 9 months ago

Hello, thank you for testing.

If training does not converge, try canceling --mid. It will use the default head network design in TextBPN++.

On the other hand, if your own dataset has a lot of overlapping ground truth, the training may not converge well. In this case I would prefer to change to another method to detect text.