ddavisqa / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

Reconciliation single-match unicode bug #357

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Clicking the "Match this topic to this cell" single-checkbox is causing 
accented characters to get corrupted in 2.0-r1836:

Paul Doré -> Paul Dor�

00000000  50 61 75 6c 20 44 6f 72  c3 a9 0a                 |Paul Dor...|

00000000  50 61 75 6c 20 44 6f 72  ef bf bd 0a              |Paul Dor....|

Repro steps: reconciliation shows a few choices, all correctly accented. 
Immediately on clicking the "Match this topic to this cell" button the accents 
become corrupted both in the reconciled view & the original target text. If I 
click 'edit' the original version shows up,
also corrupted. Clicking Apply, funnily enough, restores the accents.

metadata.json has "encoding":"UTF-8" (value.reinterpret("utf-8") did nothing of 
course)

This doesn't happen with the double-checkbox "...and all identical cells" 

Tested on: Windows 7 with Chrome

Original issue reported on code.google.com by paulm%pa...@gtempaccount.com on 29 Mar 2011 at 5:23

GoogleCodeExporter commented 8 years ago
This sounds very similar to issue 107

Original comment by tfmorris on 29 Mar 2011 at 6:55

GoogleCodeExporter commented 8 years ago
Closing as a duplicate.  It should be fixed in the upcoming release, but please 
feel free to reopen if you find it's not (or it's not a duplicate).

Original comment by tfmorris on 7 Jun 2011 at 6:37