digital-preservation / droid

DROID (Digital Record and Object Identification)
BSD 3-Clause "New" or "Revised" License
285 stars 75 forks source link

Special characters in DROID report #1132

Open allieg82 opened 1 month ago

allieg82 commented 1 month ago

Hi,

I am getting the following characters in my DROID report export (UTF 8 encoding selected) – ’ � in place of some, but not all, of the ' ' – and sometimes spaces in my file titles. They appear correct in the GUI view

Can anyone assist me in figuring out why this is happening so I can fix it - or suggest a work around?

Thanks A

image

image

kathaurielle commented 1 month ago

Hello, just to add, I tried this with the file names and WAS able to replicate it. I think we last saw this issue many years ago and fixed it manually ie by changing file names and csvs, but wonder if devs have a better fix. reports are exported as UTF-8. Kathryn.

DavidUnderdown commented 1 month ago

It will depend what the CSV file is being opened in. Excel still assumes everything uses the Windows encoding set in the locale so will misinterpret some characters (it looks like UTF em dashes are one thing being got wrong), and it's extremely hard to tell it to use a different encoding. If you open in LibreOffice instead where you can control the encoding used it will probably show everything fine if UTF-8 is selected at launch.

ross-spencer commented 1 month ago

If you open in LibreOffice instead where you can control the encoding used it will probably show everything fine if UTF-8 is selected at launch.

To add, this does look very much like Excel's mis-handling of this. In Excel you can't open it as a CSV via file->open you need to select one of the other tabs which I believe is import or import data set, then from csv. During the import process you can select the character encoding and it will import "more correctly" <-- I'd say correctly, but Excel is not a suitable tool for digital preservation.