chiphuyen / stanford-tensorflow-tutorials

This repository contains code examples for the Stanford's course: TensorFlow for Deep Learning Research.
http://cs20.stanford.edu
MIT License
10.32k stars 4.32k forks source link

Lecture 11: 11_char_rnn ... 'charmap' codec can't decode byte 0x81 in position 170 ... #122

Open terminsen opened 6 years ago

terminsen commented 6 years ago

Hi -

I get the below traceback ... can you help with this one, please ?

Kind regards, Jesper.


UnicodeDecodeError Traceback (most recent call last)

in () 148 149 if __name__ == '__main__': --> 150 main() in main() 145 lm = CharRNN(model) 146 lm.create_model() --> 147 lm.train() 148 149 if __name__ == '__main__': in train(self) 106 data = read_batch(stream, self.batch_size) 107 while True: --> 108 batch = next(data) 109 110 # for batch in read_batch(read_data(DATA_PATH, vocab)): in read_batch(stream, batch_size) 38 def read_batch(stream, batch_size): 39 batch = [] ---> 40 for element in stream: 41 batch.append(element) 42 if len(batch) == batch_size: in read_data(filename, vocab, window, overlap) 25 26 def read_data(filename, vocab, window, overlap): ---> 27 lines = [line.strip() for line in open(filename, 'r').readlines()] 28 while True: 29 random.shuffle(lines) ~\Anaconda3\lib\encodings\cp1252.py in decode(self, input, final) 21 class IncrementalDecoder(codecs.IncrementalDecoder): 22 def decode(self, input, final=False): ---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0] 24 25 class StreamWriter(Codec,codecs.StreamWriter): UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 170: character maps to
goddice commented 6 years ago

This is the file encoding issue Change line 27 to: lines = [line.strip() for line in open(filename, 'r', encoding="utf-8").readlines()]

terminsen commented 6 years ago

Thank you