cdli-gh / data

This is a copy of the daily dump of catalogue and ATF data from the Cuneiform Digital Library Initiative (http://cdli.ucla.edu)
http://cdli.ucla.edu/bulk_data
53 stars 12 forks source link

Some catalogue fields have incorrect text encodings #38

Closed rillian closed 5 years ago

rillian commented 5 years ago

Some fields in the catalogue csv file have data in non-utf-8 encodings. This is confusing for readers, and also results in incorrect display on the object webpage.

For example in P222716 Frühdyn. Beterstatuetten displays as Fr√ºhdyn. Beterstatuetten in the secondary publications field.

It's common in the CDLI comments field as well. For example in P282483 Fs. Košak displays as Fs Ko√∂ak.