ddavisqa / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

Accented characters has an issue #250

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. import a file with like é
2.
3.

What is the expected output? What do you see instead?
strange characters

What version of the product are you using? On what operating system?
2.0

Please provide any additional information below.

Original issue reported on code.google.com by nippo...@gmail.com on 24 Nov 2010 at 11:11

GoogleCodeExporter commented 8 years ago
What character encoding is used in your original file?  Can you provide a 
sample?

One thing you might try is to use the reinterpret function.  If, for example, 
your original file is UTF-8 encoded and Refine incorrectly guessed it was ISO 
8859-1, you could use Cells->Transform and the expression 

  value.reinterpret("utf-8")

to fix things up again.

Original comment by tfmorris on 26 Nov 2010 at 7:07

GoogleCodeExporter commented 8 years ago
Unless we can get additional information, we'll have to close this.  If you're 
still having the problem please help us fix it for you by providing some more 
info.

Original comment by tfmorris on 9 Jan 2011 at 6:19

GoogleCodeExporter commented 8 years ago
No response to request for information.

Original comment by tfmorris on 4 Feb 2011 at 2:20

GoogleCodeExporter commented 8 years ago
I Will send you a file

Original comment by nippo...@gmail.com on 4 Feb 2011 at 7:37

GoogleCodeExporter commented 8 years ago
Reopening pending receipt of additional information.

Original comment by tfmorris on 4 Feb 2011 at 2:29

GoogleCodeExporter commented 8 years ago
may or may not be related to this, but when i try to reinterpret i get the 
following error:

Error: reinterpret: encoding 'utf-8' is not available or recognized.

Original comment by francisc...@gmail.com on 20 May 2011 at 4:10

GoogleCodeExporter commented 8 years ago
Never received any data from the original reporter.  Closing again as 
unreproduceable.

@francisc - Please post a query on the email list with what you're attempting 
to do (or open a new bug report if you're sure it's a bug)

Original comment by tfmorris on 20 May 2011 at 5:05

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I am also trying to work with á, é, ñ and other characters (Spanish 
alphabet). 

Unfortunatelly Value.reinterpret("utf-8") does not works :(

For example: informático should be informático, this is "á" should be 
"á". I also sí this one: "Ã�". It should show "Á". "ó should be "ó". 
"ía" should be "í", "ñ" should be "ñ", and so on. 

The original file contains correct characters (it is encoded under UTF8). 

Any help will be welcome. Thanks in advance.

Original comment by txarlieg...@gmail.com on 2 Aug 2012 at 8:36

Attachments:

GoogleCodeExporter commented 8 years ago
I have just found a trick to avoid those bad characters: try copying the 
content to the clipboard, and the create de project in Google-Refine from the 
clipboard (instead of uploading the file). It works.

Original comment by txarlieg...@gmail.com on 2 Aug 2012 at 8:38

GoogleCodeExporter commented 8 years ago
txarliegarcia,
You can change the encoding at the beginning of the import process, which 
overides the guesser at that point...we provide a button for that, so simply 
choose the UTF8 encoding from the list that displays.  Like Tom says 
up-comment, you can also change the encoding AFTER the import process within 
the data grid using the reinterpret() GREL function.  Use our mailing list for 
help & questions, this is issue tracking, not a general help forum.

Original comment by thadguidry on 2 Aug 2012 at 1:11

GoogleCodeExporter commented 8 years ago
When you import a file such as a csv file or text in the lower pane " Parse 
data as " option is the "Character encoding" where select the UTF -8 character 
among others.
see you soon!

#importar archivo de texto CSV #utf-8 #encode # character encoding

Original comment by murciego.martin on 23 Jul 2015 at 3:20

Attachments: