benetech / VideoDeduplication

GNU General Public License v3.0
34 stars 12 forks source link

windows install (non docker) fails to extract features, file encoding needs to be defined #5

Open johnhbenetech opened 4 years ago

johnhbenetech commented 4 years ago
Traceback (most recent call last):
  File "extract_features.py", line 59, in <module>
    f.write("%s\n" % item)
  File "D:\Conda\envs\winnow\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 24-29: character maps to <undefined>

Also:

Traceback (most recent call last):
  File "extract_features.py", line 70, in <module>
    extractor.start(batch_size=16,cores=4)
  File "D:\CCSIDev\VideoDeduplication\winnow\feature_extraction\intermediate_cnn.py", line 12, in start
    start_video_extraction(self.video_src,self.output_path,batch_sz=batch_size,cores=cores)
  File "D:\CCSIDev\VideoDeduplication\winnow\feature_extraction\extraction_routine.py", line 118, in start_video_extraction
    feature_extraction_videos(model, cores, batch_sz, video_list, output_path)
  File "D:\CCSIDev\VideoDeduplication\winnow\feature_extraction\extraction_routine.py", line 66, in feature_extraction_videos
    video_list = {i: video.strip() for i, video in enumerate(open(video_list).readlines())}
  File "D:\Conda\envs\winnow\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 686: character maps to <undefined>
johnhbenetech commented 4 years ago

Seems to be resolved through the following changes:

extract_features.py line 57: with open(VIDEO_LIST_TXT, 'w', encoding='utf-8') as f:

extraction_routine.py line 66:

video_list = {i: video.strip() for i, video in enumerate(open(video_list, encoding='utf-8').readlines())}