训练集Chinese_dataset 下的label 文件load的时候报错

GlassyWing / text-detection-ocr

Chinese text detection and recognition based on CTPN + DENSENET using Keras and Tensor Flow，使用keras和tensorflow基于CTPN+Densenet实现的中文文本检测和识别

Apache License 2.0

284 stars 116 forks source link

训练集Chinese_dataset 下的label 文件load的时候报错 #8

Open chenzhuo2016 opened 5 years ago

chenzhuo2016 commented 5 years ago

报错信息如下：且每次报错还指向不同的labeltxt,请问有啥解决办法吗
File "F:\Python Project\text-detection-ocr\dlocr\densenet\data_loader.py", line 111, in for img, label_len, input_len, label in executor.map(lambda t: load_single_example(*t), image_labels): File "F:\Python Project\text-detection-ocr\dlocr\densenet\data_loader.py", line 96, in load_single_example label[0: len(image_label)] = [int(i) - 1 for i in image_label] #int 改成float by chenz File "F:\Python Project\text-detection-ocr\dlocr\densenet\data_loader.py", line 96, in label[0: len(image_label)] = [int(i) - 1 for i in image_label] #int 改成float by chenz ValueError: invalid literal for int() with base 10: '也受到了牵连，老是嘟着嘴，无'

GlassyWing commented 5 years ago

由于数据集源自其它开源项目，这里也沿用了原来的格式，数据集的格式是这样的： 图像文件名 \t 文本标签在字典中的位置。例如：图像文件名为a.jpg，所对应的标签为你好，那么数据集并非是a.jpg 你好 的格式，假如”你“和”好“在字典中的位置为11和13，则数据集格式应为a.jpg 11 13。

Ianwtg commented 4 years ago

请问有什么工具能快速转换成这种格式的数据集吗