declare-lab / MELD

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation
GNU General Public License v3.0
788 stars 200 forks source link

Convert Microsoft cp1252 hex codes to utf-8 format in the MELD csv's #28

Closed tae898 closed 3 years ago

tae898 commented 3 years ago

There were in total of 8 Microsoft cp1252 hex codes included in the six original csv's. I've converted them to utf-8 format that can be parsed in any other software platforms.

The conversion is as follows: cp1252_to_utf8 = { '\x85': "…", '\x91': "‘", '\x92': "’", '\x93': "“", '\x94': "”", '\x96': "–", '\x97': "—", '\xa0': " "}

Tae