RiTUAL-UH / Font_LDL_2020

This is a repository for the ACL 2020 paper: "Let Me Choose: From Verbal Context to Font Selection"
12 stars 2 forks source link

UnicodeEncodeError: 'ascii' codec can't encode character '\u2022' in position 85: ordinal not in range(128) on Ubuntu 18.04 #5

Open Franck-Dernoncourt opened 4 years ago

Franck-Dernoncourt commented 4 years ago

UnicodeEncodeError: 'ascii' codec can't encode character '\u2022' in position 85: ordinal not in range(128) on Ubuntu 18.04.

Full error message:

[...]
tensorflow:From /Font_LDL_2020/Font_LDL/logger.py:15: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

Running on CPU
[LOG] running . . . bert_seq_classification_V1
***********************printing status*************************
Train dataset: 916
Dev dataset: 131
Test dataset: 262
Traceback (most recent call last):
  File "main.py", line 40, in <module>
    corpus = Corpus.get_corpus(corpus_dir, corpus_pkl)
  File "/Font_LDL_2020/Font_LDL/data.py", line 379, in get_corpus
    corpus.print_stats()
  File "/Font_LDL_2020/Font_LDL/data.py", line 397, in print_stats
    print("Train dataset words: {}".format(self.train.X[:2]))
UnicodeEncodeError: 'ascii' codec can't encode character '\u2022' in position 85: ordinal not in range(128)

To reproduce using Docker (this assumes that PR https://github.com/RiTUAL-UH/Font_LDL_2020/pull/4 was merged first):

docker run --interactive --tty ubuntu:18.04 bash
apt update; apt install -y git nano wget htop python3 python3-pip unzip; git clone https://github.com/RiTUAL-UH/Font_LDL_2020; cd Font_LDL_2020; pip3 install -r requirements.txt; cd Font_LDL
git clone https://github.com/RiTUAL-UH/Font-prediction-dataset
mkdir -p DATA/font
cp -r Font-prediction-dataset/DATA/* DATA/font

# Train and test
python3 main.py
Franck-Dernoncourt commented 4 years ago

Use

PYTHONIOENCODING=utf8 python3 main.py

to go around this issue. Would be cleaner not to have to add PYTHONIOENCODING=utf8 though.