MaybeShewill-CV / CRNN_Tensorflow

Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition
MIT License
1.03k stars 388 forks source link

write_tfrecords 处理大数据集时变得很慢 #396

Closed kiss90 closed 4 years ago

kiss90 commented 4 years ago

在作者提到的360w图片的中文数据集基础上增加了100万图片数据,在write_tfrecords时,刚开始还挺快的,到后面越来越慢,22%进度时日志如下,有什么优化方案吗,大家在处理360万中文数据集时也是这样吗?求指导~ 22%|██▏ | 1037052/4662230 [3:42:45<82:02:44, 12.27it/s] 22%|██▏ | 1037054/4662230 [3:42:45<87:12:46, 11.55it/s] 22%|██▏ | 1037056/4662230 [3:42:46<85:39:20, 11.76it/s] 22%|██▏ | 1037058/4662230 [3:42:46<80:03:03, 12.58it/s] 22%|██▏ | 1037060/4662230 [3:42:46<83:22:32, 12.08it/s] 22%|██▏ | 1037062/4662230 [3:42:46<85:06:41, 11.83it/s] 22%|██▏ | 1037064/4662230 [3:42:47<254:44:21, 3.95it/s] 22%|██▏ | 1037065/4662230 [3:42:47<214:49:10, 4.69it/s] 22%|██▏ | 1037067/4662230 [3:42:48<180:26:57, 5.58it/s] 22%|██▏ | 1037068/4662230 [3:42:48<156:32:18, 6.43it/s] 22%|██▏ | 1037070/4662230 [3:42:48<130:47:11, 7.70it/s] 22%|██▏ | 1037072/4662230 [3:42:48<114:00:36, 8.83it/s] 22%|██▏ | 1037075/4662230 [3:42:48<96:25:09, 10.44it/s] 22%|██▏ | 1037077/4662230 [3:42:48<104:57:08, 9.59it/s] 22%|██▏ | 1037079/4662230 [3:42:49<132:01:25, 7.63it/s] 22%|██▏ | 1037080/4662230 [3:42:49<157:03:03, 6.41it/s] 22%|██▏ | 1037081/4662230 [3:42:49<174:53:19, 5.76it/s] 22%|██▏ | 1037082/4662230 [3:42:50<204:32:03, 4.92it/s]

MaybeShewill-CV commented 4 years ago

@kiss90 因为会将所有的数据先存入读入内存中,所以比较消耗内存。如果数据集很大的话 可以把数据拆分成比较小的子集 每一个子集分别生成tfrecords:)

kiss90 commented 4 years ago

找到原因了,可能跟我的执行环境有关,我是通过命令提交到远程镜像上运行的,数据是在一个机器上,命令在另一个镜像上运行,这样的话判断一个文件是否存在相比在本地机器上就比较复杂了。我把shadownet_data_feed_pipline.py 中的判断图片是否存在的语句注释掉后就跑的飞快了。 `

if not ops.exists(image_path):

            #     raise ValueError('Example image {:s} not exist'.format(image_path))

` 96%|█████████▌| 4477932/4662230 [09:29<00:11, 15595.05it/s] 96%|█████████▌| 4479493/4662230 [09:29<00:11, 15427.97it/s] 96%|█████████▌| 4481061/4662230 [09:29<00:11, 15499.85it/s] 96%|█████████▌| 4482625/4662230 [09:29<00:11, 15540.28it/s] 96%|█████████▌| 4484180/4662230 [09:30<00:11, 15523.04it/s] 96%|█████████▌| 4485733/4662230 [09:30<00:11, 15514.31it/s] 96%|█████████▌| 4487295/4662230 [09:30<00:11, 15543.50it/s] 96%|█████████▋| 4488850/4662230 [09:30<00:11, 15474.03it/s] 96%|█████████▋| 4490398/4662230 [09:30<00:21, 7960.22it/s] 96%|█████████▋| 4491594/4662230 [09:30<00:20, 8322.40it/s] 96%|█████████▋| 4492711/4662230 [09:31<00:34, 4906.56it/s] 96%|█████████▋| 4493568/4662230 [09:31<00:40, 4135.60it/s] 96%|█████████▋| 4494259/4662230 [09:31<00:36, 4659.53it/s] 96%|█████████▋| 4494941/4662230 [09:32<00:53, 3128.18it/s]

多谢大佬指点迷津 :)

MaybeShewill-CV commented 4 years ago

@kiss90 不客气:)