请问，单GPU怎么解决训练过程中mae,mse不变的问题？

kexiaolong0121 commented 6 months ago

cnwxi commented 6 months ago

同样的问题。

cnwxi commented 6 months ago

使用2*2080Ti，py3.6，torch1.8.0 学习率调整为1e-5

export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1 
python -m torch.distributed.launch --nproc_per_node=2 --master_port 8218 train_distributed.py --gpu_id '0,1' \                     
--gray_aug --gray_p 0.1 --scale_aug --scale_type 1 --scale_p 0.3 --epochs 1500 --lr_step 1200 --lr 1e-5 \ 
--batch_size 4 --num_patch 1 --threshold 0.35 --num_queries 700 \ 
--dataset nwpu --crop_size 256 --pre /media/ubuntu/2.0TB/wxy/CLTR/save_file/log_file/20240427_004616/checkpoint.pth --test_per_epoch 1  --test_patch --save

大概在几个epoch后测试的mae、mse值会发生变化，第一个epoch预测全为零，几个epoch后部分图片的预测开始变化。建议先10个epoch后查看结果。

kexiaolong0121 commented 6 months ago

我是除去了分布式训练的代码然后在jhu数据集上可以了，但在nwpu数据集上还是不行，哥可以加个联系方式交流一下吗

cnwxi commented 6 months ago

首页有邮箱

kexiaolong0121 commented 6 months ago

调整一下学习率真的可以了，谢谢哥，你真厉害！

cnwxi commented 6 months ago

没事，互相帮助互相学习

未响应 @.***

---原始邮件--- 发件人: @.> 发送时间: 2024年4月27日(周六) 晚上8:17 收件人: @.>; 抄送: @.**@.>; 主题: Re: [dk-liang/CLTR] 请问，单GPU怎么解决训练过程中mae,mse不变的问题？ (Issue #30)

调整一下学习率真的可以了，谢谢哥，你真厉害！

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

dk-liang / CLTR

请问，单GPU怎么解决训练过程中mae,mse不变的问题？ #30