DeepGraphLearning / KnowledgeGraphEmbedding

MIT License
1.24k stars 264 forks source link

Bugfix unicode compatibility #30

Open wradstok opened 4 years ago

wradstok commented 4 years ago

Hi,

When operating on my data set I was getting the following error:

> Traceback (most recent call last):
>   File "codes/run.py", line 361, in <module>
>     main(parse_args())
>   File "codes/run.py", line 211, in main
>     train_triples = read_triple(os.path.join(args.data_path, 'train.txt'), entity2id, relation2id)
>   File "codes/run.py", line 127, in read_triple
>     triples.append((entity2id[h], relation2id[r], entity2id[t]))
> KeyError: 'Găgăuzia\xa0'

To fix the issue, I added calls to unicodedata.normalize() before loading triples to normalize such weird characters (non-breaking spaces). I also fixed two minor grammatical mistakes in error messages.

wradstok commented 4 years ago

I mistakenly thought I was dealing with a unicode issue due to the error I received. Upon investigating closer I realized that when the entity/relations are loaded and the entire line is split, trailing spaces are removed because the names are at the end. However, when loading triples this only occurs on the tail entities. Fixed by mapping str.split() on all components when loading triples.