Pay20Y / FOTS_TF

This an implementation of FOTS with tensorflow
GNU General Public License v3.0
181 stars 59 forks source link

why so slow? #50

Open likezjuisee opened 4 years ago

likezjuisee commented 4 years ago

Find 1 images 6055 text boxes before nms test/screenshot.png : detect 3372ms, restore 5ms, nms 147ms, recog 4796ms [timing] 8.196640491485596

screenshot

Pay20Y commented 4 years ago

It seems too many boxes before NMS. May I ask which dataset you used?

likezjuisee commented 4 years ago

The image mentioned before is my own test image. And the model used is from your readme.md.

likezjuisee commented 4 years ago

And I found the cost time is not stable, maybe 10s this time or 3s next time.

Pay20Y commented 4 years ago

That's strange. The NMS consumes little time, I think you should check your GPU first, it maybe run with CPU. You can also debug the detection branch first with EAST.

likezjuisee commented 4 years ago

(fots) root@test-desktop:~/like/fots/FOTS_TF# python3.5 main_test.py --gpu_list='1' --test_data_path=test/ --checkpoint_path=checkpoints/SynthText_6_epochs/ make: Entering directory '/home/test/like/fots/FOTS_TF/lanms' make: 'adaptor.so' is up to date. make: Leaving directory '/home/test/like/fots/FOTS_TF/lanms' resnet_v1_50/block1 (?, ?, ?, 256) resnet_v1_50/block2 (?, ?, ?, 512) resnet_v1_50/block3 (?, ?, ?, 1024) resnet_v1_50/block4 (?, ?, ?, 2048) Shape of f_0 (?, ?, ?, 2048) Shape of f_1 (?, ?, ?, 512) Shape of f_2 (?, ?, ?, 256) Shape of f_3 (?, ?, ?, 64) Shape of h_0 (?, ?, ?, 2048), g_0 (?, ?, ?, 2048) Shape of h_1 (?, ?, ?, 128), g_1 (?, ?, ?, 128) Shape of h_2 (?, ?, ?, 64), g_2 (?, ?, ?, 64) Shape of h_3 (?, ?, ?, 32), g_3 (?, ?, ?, 32) pad_rois shape: Tensor("RoIrotate/TensorArrayStack/TensorArrayGatherV3:0", shape=(?, 8, ?, 32), dtype=float32) WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/sparse_ops.py:1165: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create a tf.sparse.SparseTensor and use tf.sparse.to_dense instead. 2020-05-28 11:38:23.281118: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2020-05-28 11:38:23.480434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:65:00.0 totalMemory: 10.76GiB freeMemory: 10.45GiB 2020-05-28 11:38:23.480467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2020-05-28 11:38:23.787260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-28 11:38:23.787300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2020-05-28 11:38:23.787306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2020-05-28 11:38:23.787402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10086 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:65:00.0, compute capability: 7.5) Restore from checkpoints/SynthText_6_epochs/model.ckpt-733268 Find 2 images 5084 text boxes before nms test/006.jpg : detect 1903ms, restore 2ms, nms 23ms, recog 0ms [timing] 1.9098529815673828 6055 text boxes before nms test/screenshot.png : detect 758ms, restore 5ms, nms 77ms, recog 0ms [timing] 0.7702224254608154

The cost time is reduced, but fps is lower than the paper mentioned.

the east result:

(east) root@test-desktop:~/like/east/EAST# python eval.py --test_data_path=/home/test/like/fots/FOTS_TF/test/ --gpu_list=0 --checkpoint_path=east_icdar2015_resnet_v1_50_rbox/ --output_dir=. make: Entering directory '/home/test/like/east/EAST/lanms' make: 'adaptor.so' is up to date. make: Leaving directory '/home/test/like/east/EAST/lanms' resnet_v1_50/block1 (?, ?, ?, 256) resnet_v1_50/block2 (?, ?, ?, 512) resnet_v1_50/block3 (?, ?, ?, 1024) resnet_v1_50/block4 (?, ?, ?, 2048) Shape of f_0 (?, ?, ?, 2048) Shape of f_1 (?, ?, ?, 512) Shape of f_2 (?, ?, ?, 256) Shape of f_3 (?, ?, ?, 64) Shape of h_0 (?, ?, ?, 2048), g_0 (?, ?, ?, 2048) Shape of h_1 (?, ?, ?, 128), g_1 (?, ?, ?, 128) Shape of h_2 (?, ?, ?, 64), g_2 (?, ?, ?, 64) Shape of h_3 (?, ?, ?, 32), g_3 (?, ?, ?, 32) 2020-05-28 11:41:11.735064: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2020-05-28 11:41:11.830571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:17:00.0 totalMemory: 10.76GiB freeMemory: 10.45GiB 2020-05-28 11:41:11.830603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2020-05-28 11:41:12.134395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-28 11:41:12.134435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2020-05-28 11:41:12.134445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2020-05-28 11:41:12.134540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10081 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:17:00.0, compute capability: 7.5) Restore from east_icdar2015_resnet_v1_50_rbox/model.ckpt-49491 Find 2 images 4697 text boxes before nms /home/test/like/fots/FOTS_TF/test/006.jpg : net 1876ms, restore 7ms, nms 21ms [timing] 1.911694049835205 6116 text boxes before nms /home/test/like/fots/FOTS_TF/test/screenshot.png : net 625ms, restore 9ms, nms 60ms [timing] 0.7100484371185303

Is because of the ResNet50 too complicated?

likezjuisee commented 4 years ago

中文的识别结果,为啥都是字母和数字:

404,407,710,414,709,472,402,464,h-1y51e7hi5 250,518,567,525,566,564,249,557,iric/-n? 220,594,488,585,489,604,220,613,M-senl 194,984,433,992,431,1027,193,1018,i2EJ1c- 284,807,587,819,585,853,282,842,isvE 73,1523,401,1536,399,1579,71,1566,Mes5E-F2t 63,1597,152,1599,148,1718,59,1715,a 242,1265,518,1255,519,1293,243,1303,-3he-7te 621,1416,870,1407,872,1459,623,1469,Netzxt 31,94,110,93,111,155,32,156,fow 965,1454,1032,1457,1030,1486,964,1483,2743 142,1584,428,1589,427,1625,142,1620,"nFhatElzsae 1025,1026,1077,1023,1079,1050,1027,1053,Xe 29,163,116,165,115,189,28,187,az 377,272,474,269,475,291,377,293,ivm 695,1254,834,1249,836,1285,697,1291,ges 575,1533,790,1528,791,1569,575,1574,cIEER-T 279,128,438,121,439,156,281,163,Heer -3,23,169,30,167,61,-4,54,Xhn 961,25,1061,20,1062,52,962,57,09:22 553,1590,687,1585,688,1617,555,1622,Frk 111,871,280,876,279,914,110,909,KL/E" 267,873,462,878,461,914,266,909,"FeEr'" 260,1420,408,1416,409,1463,261,1467,Ths 34,245,138,248,136,301,32,298,Jets 366,688,498,684,499,716,367,720,ey 722,1731,784,1729,785,1757,722,1759,ee 715,1115,792,1118,791,1151,714,1148,tit 763,240,863,242,862,293,762,291,Bgy 245,1119,369,1116,371,1158,246,1161,>A( 720,875,863,879,862,912,719,908,ae 421,1253,665,1256,664,1292,421,1288,thigeit. 544,240,692,241,691,294,543,292,egEI 479,872,639,881,637,916,477,907,Ree

Pay20Y commented 4 years ago

你好,因为模型是在英文数据集上训练的,所以无法直接用在中文数据集上,您可以修改一下代码,然后finetune一下。至于达不到论文中的FPS,这可能是因为我本身能力有限,代码不是那么完美,也可能和硬件设备有一些关系。

likezjuisee commented 4 years ago

明白了,已经很赞了。 如果我想做的是软件界面的文字识别,角度一般都是0度的,有什么快速的方法推荐么?

Pay20Y commented 4 years ago

您的意思是只有水平文本吗(没有倾斜之类的)?那您可以试一下CTPN+CTC的结构,网上也有很多实现比如这个

likezjuisee commented 4 years ago

是的,就像上面的软件界面,基本都是水平文本。 谢谢,我看看。

likezjuisee commented 4 years ago

尝试了一下,上面那张图需要2.5秒左右的时间。 还有就是这种两阶段的模型需要占用两个显卡,还是比较昂贵的哈。 还有其他方法么?

Pay20Y commented 4 years ago

应该是CTPN本身比较慢,可以试一下这个或者这个

likezjuisee commented 4 years ago

https://github.com/ouyanghuiyu/chineseocr_lite 这个我试了下,速度确实快,但是精度降低了很多,有点难以满足需求。 提高速度的思路,我理解是对模型进行了简化,FOTS会有FOTS_lite版本么?

Pay20Y commented 4 years ago

抱歉,目前我没有这样的计划,您可以看一下别的关于FOTS的复现。

likezjuisee commented 4 years ago

了解了。

SkrDrag commented 2 years ago

你好,因为模型是在英文数据集上训练的,所以无法直接用在中文数据集上,您可以修改一下代码,然后finetune一下。至于达不到论文中的FPS,这可能是因为我本身能力有限,代码不是那么完美,也可能和硬件设备有一些关系。

你好我想请问一下,我试图使用你发布的预训练模型在中文数据集上进行微调训练,却发生无法加载模型的报错。请问该怎么修改代码呢