Open argman opened 7 years ago
@argman have you converted the checkpoints from VGG16 FC reduced caffemodel? I used converted checkpoints and train from scratch on ICDAR2015 and it shows good results, the loss should converge to 2.0 more or less, you can see #4 to download my checkpoints
@BowieHsu , thks! I will try, and will post my result here.
@BowieHsu , btw, can you share your trained model ? As i am using tf-1.3, so need to check whether some changes in tf.
@BowieHsu , after 6 hours of training using 4 gpus, the loss curve is
@BowieHsu , thks for your model, i can get meaningful result now! The model is really hard to train..
haha,it's really a good news
@BowieHsu , hi, I used converted checkpoints and trained from scratch on ICDAR2015 but I got a bad result. I set the learning rate in json file like this:
"max_steps": 90000, "base_lr": 1e-4, "lr_breakpoints": [10000, 20000, 60000, 75000, 90000], "lr_decay": [0.64, 0.8, 1.0, 0.1, 0.01],
I guess maybe the base_lr is too samll or something else. Could you please show me your training strategy and the good results? Thank you so much!
@JiasiWang Hi,wang, I'm also trained the model with default pretrain.json which shows good result,how about your batch size? or you may check loss value using tensorboard
@BowieHsu , I did not change the batchsize, it is 32. I just changed the base_lr to 1e-4. I will check it, thanks
@JiasiWang Yep, the default learning rate should be 5e-4.
@JiasiWang By the way,the ICDAR2015 seglink model should pretrain on Synthtext datasets first, then finetune on ICDAR2015 train data sets if you want to reach 75% Hmean.
@BowieHsu yeah, I know that seglink model need pretrain on Synthtext datasets. and without pretrain, I only get 58% Hmean. After that, I also pretrained the model as the paper showed, then fine-tune it, both steps I use the default json file, but it seems like that the loss did not converge in finetuning step.
May I ask how to use your model? As I not familiar with tensorflow. I tried to load it in tensorflow 1.4, but I got following error. I did some search but no solution works for me.
i tried following solutions:
change seglink/sovler.py with
model_loader.restore(sess, './data/VGG_ILSVRC_16_layers_ssd/VGG_ILSVRC_16_layers_ssd.ckpt.data-00000-of-00001')
set a folder with name VGG_ILSVRC_16_layers_ssd and passed its pass in json
set finetune_model value as VGG_ILSVRC_16_layers_ssd.ckpt, wich is a copy of VGG_ILSVRC_16_layers_ssd.ckpt.data-00000-of-00001
Error log:
seglink/data/VGG_ILSVRC_16_layers_ssd/VGG_ILSVRC_16_layers_ssd.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
try "model_loader.restore(sess, './data/VGG_ILSVRC_16_layers_ssd/VGG_ILSVRC_16_layers_ssd.ckpt)" @Godricly
Many thanks! That saved my ass. :+1:
@Godricly 不客气,道友
@BowieHsu 请问我如何利用您Pretrain的模型跳过批pretrain那一步呢??请问exp/sgd/checkpoint里头是pretrain过程当中的模型吗?但是我将您的模型放进去他说formar不对
@tianzhuotao pretrain的json文件是用来训练基于sythtext数据集的模型,如果你不想训练这个模型而是想直接训练基于icdar2015的模型的话 1.修改exp/sgd/finetune_ic15.json中的checkpoint_path为你放置的vgg模型的位置
@BowieHsu 那个finetune的json文件里头只有一个finetune_model, 似乎EXP/SGD里头需要有一个checkpoint文件存在,但是我没有经过pretrain所以没有,您的模型里头似乎也只有3个文件,请问这个如何解决呢?
你可以看到finetune.json文件中有两行 "resume": "finetune", "finetune_model": "../exp/sgd/checkpoint" 把这里的/exp/sgd/checkpoint替换成你放置的我转换的checkpoint就可以了,你可以注意看一下log信息,如果tensorflow找到了checkpoint但是依然报错,是因为这里的resume选项选的是finetune,有一些variable是在vgg模型中不存在的,所以你可能还需要把"resume":"finetune"改成"resume":"vgg16",你可以先试一试
@BowieHsu 十分感谢!好人一生平安. 还解决了一些其他的问题(gpu什么的...)终于跑起来了
@tianzhuotao 你可以关注一下训练的损失函数,如果是直接从vgg模型上来finetune的话,需要调整一下学习率,反正就慢慢调参吧,当然也需要根据实际的任务魔改代码,祝好运。
@BowieHsu 谢谢!我目前用的是默认参数,但是训练起来很慢,7个小时训练了6%,感觉很慢阿qwq 请问您训练大概用了多久呢? 我目前集群申请的16core cpu\1个gpu和32gb内存以及10g硬盘
你好,我最近刚好也在研究多方向文字检测,可以加个qq交流一下吗?
@tianzhuotao @BowieHsu
你好,convert_caffemodel_to_ckpt.py 文件中import model_vgg16 这个model_vgg16需要用什么来装,装到哪里,还有运行run.sh 时报caffe的错误,网络说是python版本问题,需要换到python2.7,看您的介绍里是用的python3呀,能帮我解决一下疑惑吗
@13230380356 我刚刚解决了pretrain的问题 具体可以看外面#13我刚刚写的tips
try "model_loader.restore(sess, './data/VGG_ILSVRC_16_layers_ssd/VGG_ILSVRC_16_layers_ssd.ckpt)" @Godricly
everythin is OK until 2018-11-23 04:53:37,597 [INFO ] Restoring parameters from ../premodel/ILVSR_VGG_16_FC_REDUCED/VGG_ILSVRC_16_layers_ssd.ckpt Segmentation fault (core dumped
how to debug?Segmentation fault (core dumped
. every comment is welcome
@BowieHsu @JiasiWang 我用了SynthText 40g做的tf文件,预训练90000轮以后,因为finetune_ic15.json里面"finetune_model": "../exp/sgd/checkpoint"(默认)跑不通,我改成了"finetune_model": "../exp/sgd/checkpoint-90000"。接下来训练10000轮以后。在ic15测试集上面跑出的结果只有 Recall | Precision | Hmean 59.56 % | 63.47 % | 61.45 %
为什么没有达到75%呢? 道友盼回复,感谢大佬!
改成batch-size32 依然hmean,61%左右。
我拿预训练模型跑测试,不经过finetune,结果是hmean49%
我拿预训练模型跑测试,不经过finetune,结果是hmean49%
我跟你结果都一样,目前不知道该怎么优化了
Thanks for the clean and elegant code! I tried to run training from scratch (use pretrained vgg_16 model on imagenet), the traning process looks weird.
Total Loss
And the corresponding loss for others.
the loss quickly converged to about 10+, and I test the model, but no text boxes is detected, how can I diagnose this?