ViTAE-Transformer / DeepSolo

The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting"
Other
250 stars 34 forks source link

Training Log #13

Closed HumanZhong closed 1 year ago

HumanZhong commented 1 year ago

Hi, I am using 4 v100 gpus for pretraining to reproduce your results but the training speed seems slow. The estimated time is about 8days. Can you release your training log for reference?

ymy-k commented 1 year ago

Hi, a reference is here. For pretraining a model with ResNet-50 for 375K iterations, it takes about 1day and 14hours on 4A100 gpus. log.txt

ymy-k commented 1 year ago

By the way, the evaluation performance in the log are just for reference using centerline matching, but not the formal IOU matching evaluation results.

Zalways commented 1 year ago

image

hello,我在训练过程中这一步全是0,请问您知道大概会是什么原因吗?另外worker的参数设置为8也会报错,只有设置成了0 降低了训练速度

ymy-k commented 1 year ago

这是第几步测试的结果?还修改了什么参数?图片和gt没问题?是从这个仓库下的?如果按我的默认设置是不会有0的情况

Zalways commented 1 year ago

你好 我都是按照仓库要求下载的,我之前跑到一万次才会进行test,但是由于我的worker数为8,(按原程序)到一万步时test会报错,我将worker改为1,并为了方便调试,我把TEST 的EVAL PERIOD设置成了100,也就是训练100轮就test一下,就出现这个问题了,我的batchsize为8,在8块3090服务器上跑的

9365 @.***

 

------------------ 原始邮件 ------------------ 发件人: "ViTAE-Transformer/DeepSolo" @.>; 发送时间: 2023年5月18日(星期四) 上午10:46 @.>; @.**@.>; 主题: Re: [ViTAE-Transformer/DeepSolo] Training Log (Issue #13)

这是第几步测试的结果?还修改了什么参数?图片和gt没问题?是从这个仓库下的?如果按我的默认设置是不会有0的情况

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

ymy-k commented 1 year ago

嗯嗯了解了,EVAL_PERIOD 100的话跑了100iters,看了800张图,网络收敛还没有快到只看这么点图就能有正确的测试结果。大概你得调多点才能看到非0的结果

Zalways commented 1 year ago

好的 我试试 谢谢您!

9365 @.***

 

------------------ 原始邮件 ------------------ 发件人: "ViTAE-Transformer/DeepSolo" @.>; 发送时间: 2023年5月18日(星期四) 上午10:59 @.>; @.**@.>; 主题: Re: [ViTAE-Transformer/DeepSolo] Training Log (Issue #13)

嗯嗯了解了,EVAL_PERIOD 100的话跑了100iters,看了800张图,网络收敛还没有快到只看这么点图就能有正确的测试结果。大概你得调多点才能看到非0的结果

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Mark-Laohu commented 1 year ago

您好,请问你的训练时长问题解决了吗,我在四卡3090上bs8训练也要将近八天,搞不清楚哪里的问题

Mark-Laohu commented 1 year ago

Hi, a reference is here. For pretraining a model with ResNet-50 for 375K iterations, it takes about 1day and 14hours on 4A100 gpus. log.txt

您好,我在8卡2080上训练时长就变成3天多一点了,只是不同阶段精度和你给出的精度差不少,请问您知道可能是哪里的问题吗 out_pretrain_script.txt