OpenRefine / OpenRefine

OpenRefine is a free, open source power tool for working with messy data and improving it
https://openrefine.org/
BSD 3-Clause "New" or "Revised" License
10.76k stars 1.94k forks source link

German characters in CSV filename are corrupted (Unicode converted to ASCII?) #6082

Open pediRAM opened 11 months ago

pediRAM commented 11 months ago

Importing images from directory "Gemälde" produces CSV filename with special charcters (seems that unicode characters from Win.11 directory have not been correctly processed)

To Reproduce

Steps to reproduce the behavior:

  1. First, create a directory called "Gemälde" and put some images into it
  2. Then, run OpenRefine on Win.11 and open the directory, select all files
  3. Finally, you will see the corrupted name, like the screenshot

20230930_120447

tfmorris commented 9 months ago

@pediRAM sorry for the delay in triaging this bug report.

Can you please post the output of the chcp command on your system so that we know what code page (ie character encoding) is being used?

tfmorris commented 9 months ago

Also, is this base OpenRefine or are you using an extension?

pediRAM commented 8 months ago

@pediRAM sorry for the delay in triaging this bug report.

Can you please post the output of the chcp command on your system so that we know what code page (ie character encoding) is being used?

I installed it on Ubuntu (Linux) 22.04 LTS (64 bit)... the "chcp" is as far as I know, is a well known Windows tool... I had to install it manually on my Ubuntu sysstem. The output of chcp was:

chcp: too few arguments

But if you want to know which language, location and encoding Ubuntu is using (and was using while installation of OpenRefine): Language: de_AT.UTF-8 Encoding: UTF-8

Also, is this base OpenRefine or are you using an extension?

No! I did not install any extensions until now.

Here is the output of OpenRefine while startup:

------------------------------------------------------------------------------------------------
You have 5867M of free memory.
Your current configuration is set to use 1400M of memory.
OpenRefine can run better when given more memory. Read our FAQ on how to allocate more memory here:
https://docs.openrefine.org/manual/installing#increasing-memory-allocation
-------------------------------------------------------------------------------------------------

log4j:WARN No appenders could be found for logger (org.eclipse.jetty.util.log).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Gtk-Message: 17:53:39.204: Failed to load module "canberra-gtk-module"
update.go:85: cannot change mount namespace according to change mount (/var/lib/snapd/hostfs/usr/share/gimp/2.0/help /usr/share/gimp/2.0/help none bind,ro 0 0): cannot open directory "/var/lib": permission denied
update.go:85: cannot change mount namespace according to change mount (/var/lib/snapd/hostfs/usr/share/xubuntu-docs /usr/share/xubuntu-docs none bind,ro 0 0): cannot open directory "/var/lib": permission denied

(firefox:5362): Gtk-WARNING **: 17:53:39.948: GTK+ module /snap/firefox/3358/gnome-platform/usr/lib/gtk-2.0/modules/libcanberra-gtk-module.so cannot be loaded.
GTK+ 2.x symbols detected. Using GTK+ 2.x and GTK+ 3 in the same process is not supported.
Gtk-Message: 17:53:39.948: Failed to load module "canberra-gtk-module"

(firefox:5362): Gtk-WARNING **: 17:53:39.951: GTK+ module /snap/firefox/3358/gnome-platform/usr/lib/gtk-2.0/modules/libcanberra-gtk-module.so cannot be loaded.
GTK+ 2.x symbols detected. Using GTK+ 2.x and GTK+ 3 in the same process is not supported.
Gtk-Message: 17:53:39.951: Failed to load module "canberra-gtk-module"
ATTENTION: default value of option mesa_glthread overridden by environment.
ATTENTION: default value of option mesa_glthread overridden by environment.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/share/openrefine/webapp/WEB-INF/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/share/java/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (refine).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.