PatrickLib / captcha_recognize

Image Recognition captcha without image segmentation 无需图片分割的验证码识别
Apache License 2.0
557 stars 174 forks source link

载入图片的代码会使图片载入两次 #6

Open corberan opened 6 years ago

corberan commented 6 years ago
extensions = ['jpg', 'JPG', 'jpeg', 'JPEG', 'png', 'PNG']
# ...
for extension in extensions:
     file_glob = os.path.join(image_dir, '*.' + extension)
     file_list.extend(gfile.Glob(file_glob))

在Windows上,后缀大小写不区分,同一张图片会被载入两次: image

eli95 commented 6 years ago

哇, 你真细心! `file_list = [] for extension in extensions: file_glob = os.path.join('./tttt', '*.' + extension) file_list.extend(gfile.Glob(file_glob))

file_list Out[11]: ['.\tttt\0018_num6235.png', '.\tttt\0023_num5141.png', '.\tttt\0023_num8005.png', '.\tttt\0018_num6235.png', '.\tttt\0023_num5141.png', '.\tttt\0023_num8005.png']`

这样可能会解决这个重复的问题: for file_name in list(set(file_list)):

PatrickLib commented 6 years ago

是的,windows下只保留小写或者大写就可以了,用set过滤也不错。不管也可以,不影响训练。