alex-berard / seq2seq

Attention-based sequence to sequence learning
Apache License 2.0
388 stars 122 forks source link

Training a new model: Ascii codec can't decode byte 0xc3 #31

Open Steven1791 opened 4 years ago

Steven1791 commented 4 years ago

There seems to be some kind of encoding problem when training a new model (en-fr). I'm pretty sure it's because of the french alphabet (e.g. é, è ...). To make sure the problem wasn't caused by me, I followed the instructions provided. However I was running the code on a GPU cluster in a docker container. See the files attached for a complete list of apt, pip and pip3 packages available in my container (I provided python 2 and 3, since I wasn't sure if 2 is still needed).

I downloaded the LibriSpeech Data set and used the LibriSpeech AST config file to train a new model, the error occured within 30s after starting training.

image

installed-software.txt packages_pip.txt packages_pip3.txt