cwerner / fastclass

Little tools to download and then weed through images, delete and classify them into groups for building deep learning image datasets (based on crawler and tkinter)
Apache License 2.0
133 stars 25 forks source link

UnicodeEncodeError while writing to log file #18

Closed H4dr1en closed 5 years ago

H4dr1en commented 5 years ago

The follwing error occured when writing to the log file:

Searching: >> Nike Moon Racer <<
(1) Crawling ...
    -> GOOGLE
    -> BING
    Number of duplicate image files: 1. Removing...
(2) Resizing images to (299, 299)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 32.20it/s]                
Traceback (most recent call last):
File "...\envs\py3\lib\runpy.py", line 193, in _run_module_as_main                                                        "__main__", mod_spec)
File "...\envs\py3\lib\runpy.py", line 85, in _run_code                                                                   exec(code, run_globals)
File "...\envs\py3\Scripts\fcd.exe\__main__.py", line 9, in <module>
File "...\envs\py3\lib\site-packages\click\core.py", line 764, in __call__                                                return self.main(*args, **kwargs)
File "...\envs\py3\lib\site-packages\click\core.py", line 717, in main                                                    rv = self.invoke(ctx)
File "...\envs\py3\lib\site-packages\click\core.py", line 956, in invoke                                                  return ctx.invoke(self.callback, **ctx.params)
File "...\envs\py3\lib\site-packages\click\core.py", line 555, in invoke                                                  return callback(*args, **kwargs)
File "...\envs\py3\lib\site-packages\fastclass\fc_download.py", line 163, in cli                                          main(infile, size, crawler, keep, maxnum, outpath)
File "...\envs\py3\lib\site-packages\fastclass\fc_download.py", line 132, in main                                         log.write(','.join([item, source_urls[item]]) + '\n')
File "...\envs\py3\lib\encodings\cp1252.py", line 19, in encode                                                           return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 102-106: character maps to <undefined> 

The error comes from here.

To reproduce the bug, simply enter in a csv file:

searchterm,exclude
Nike Moon Racer, "Nike"

and run fcd as following:

fcd -m 25 label.csv

The error comes from unknown characters of the address of this picture

I fixed it and will attach the fix to #16