chbrown / liwc-python

Linguistic Inquiry and Word Count (LIWC) analyzer
MIT License
193 stars 50 forks source link

KeyError: 8 German 2001 .dic #16

Open vicru opened 3 years ago

vicru commented 3 years ago

Hi, has anybody used the German 2001 .dic with this library? I am getting "KeyError: 8" when I try to load the lexicon.

I've managed to solve this issue myself by accessing to the liwc/liwc/dic.py file and changing the read_dic(filepath) method to read my file using the utf-8 encoding.

Thank you for sharing your own solution. I would appreciate if you provided the details about it. I just added encoding='utf-8' to liwc/liwc/dic.py, but apparently that is not right, because I keep getting "KeyError: '8'" with open(filepath, encoding='utf-8') as lines:

By the way, I am trying to use this library not with a Spanish but with a German (from 2001) version. That is why I think your suggested solution might make this library work with the German 2001 .dic

Originally posted by @vicru in https://github.com/chbrown/liwc-python/issues/15#issuecomment-913695911

Hadjimina commented 2 years ago

Hi

You need to make sure that in you dic that you provide the gap between expression and group is not made up of spaces but of one single tab. That goes also for when you have one expression assigned to multiple groups. You can use an editor like vs code to see where you have space and where you have tabs, as you will not be able to see it solely by looking at it since both are whitespace.

Right: image

Wrong: image

Basically, try to not have any spaces anywhere in your dic file and only use single tabs. Let me know if this works, though I just realised that this is a super old issue 😅.