Training Log - Githubissues

ViTAE-Transformer / DeepSolo

The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting"

Other

250 stars 34 forks source link

Training Log #13

Closed HumanZhong closed 1 year ago

HumanZhong commented 1 year ago

Hi, I am using 4 v100 gpus for pretraining to reproduce your results but the training speed seems slow. The estimated time is about 8days. Can you release your training log for reference?

ymy-k commented 1 year ago

Hi, a reference is here. For pretraining a model with ResNet-50 for 375K iterations, it takes about 1day and 14hours on 4A100 gpus. log.txt

ymy-k commented 1 year ago

By the way, the evaluation performance in the log are just for reference using centerline matching, but not the formal IOU matching evaluation results.

Zalways commented 1 year ago

hello，我在训练过程中这一步全是0，请问您知道大概会是什么原因吗？另外worker的参数设置为8也会报错，只有设置成了0 降低了训练速度

ymy-k commented 1 year ago

这是第几步测试的结果？还修改了什么参数？图片和gt没问题？是从这个仓库下的？如果按我的默认设置是不会有0的情况

Zalways commented 1 year ago

你好我都是按照仓库要求下载的，我之前跑到一万次才会进行test，但是由于我的worker数为8，（按原程序）到一万步时test会报错，我将worker改为1，并为了方便调试，我把TEST 的EVAL PERIOD设置成了100，也就是训练100轮就test一下，就出现这个问题了，我的batchsize为8，在8块3090服务器上跑的

9365 @.***

------------------ 原始邮件 ------------------ 发件人: "ViTAE-Transformer/DeepSolo" @.>; 发送时间: 2023年5月18日(星期四) 上午10:46 @.>; @.**@.>; 主题: Re: [ViTAE-Transformer/DeepSolo] Training Log (Issue #13)

这是第几步测试的结果？还修改了什么参数？图片和gt没问题？是从这个仓库下的？如果按我的默认设置是不会有0的情况

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

ymy-k commented 1 year ago

嗯嗯了解了，EVAL_PERIOD 100的话跑了100iters，看了800张图，网络收敛还没有快到只看这么点图就能有正确的测试结果。大概你得调多点才能看到非0的结果

Zalways commented 1 year ago

好的我试试谢谢您！

9365 @.***

------------------ 原始邮件 ------------------ 发件人: "ViTAE-Transformer/DeepSolo" @.>; 发送时间: 2023年5月18日(星期四) 上午10:59 @.>; @.**@.>; 主题: Re: [ViTAE-Transformer/DeepSolo] Training Log (Issue #13)

嗯嗯了解了，EVAL_PERIOD 100的话跑了100iters，看了800张图，网络收敛还没有快到只看这么点图就能有正确的测试结果。大概你得调多点才能看到非0的结果

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Mark-Laohu commented 1 year ago

您好，请问你的训练时长问题解决了吗，我在四卡3090上bs8训练也要将近八天，搞不清楚哪里的问题

Mark-Laohu commented 1 year ago

Hi, a reference is here. For pretraining a model with ResNet-50 for 375K iterations, it takes about 1day and 14hours on 4A100 gpus. log.txt

您好，我在8卡2080上训练时长就变成3天多一点了，只是不同阶段精度和你给出的精度差不少，请问您知道可能是哪里的问题吗 out_pretrain_script.txt