ChengjunWu / google-refine

Automatically exported from code.google.com/p/google-refine
Other
0 stars 0 forks source link

The common transform “Trim leading and trailing whitespace” doesn’t trim non-breaking spaces #604

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Have a cell with “ x”as its value (the space being a non-breaking 
space).
2. Apply the common transform “Trim leading and trailing whitespace” to its 
column.

What is the expected output? What do you see instead?
The cell’s value should become “x”. Instead, it does not change.

What version of Google Refine are you using?
2.5

What operating system and browser are you using?
Mac OS X 10.7.4
Chrome Version 21.0.1180.75

Is this problem specific to the type of browser you're using or it happens
in all the browsers you tried?
The same behavior appears in Safari 6.0 (7536.25).

Please provide any additional information below.
Perhaps this is the intended behavior. But it does not reflect the function’s 
name (“Trim leading and trailing whitespace”).

Original issue reported on code.google.com by palpalpa...@gmail.com on 14 Aug 2012 at 7:00

GoogleCodeExporter commented 8 years ago
This function implements an "old-school" definition of whitespace as described 
here: http://docs.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#trim()

There are lots of Unicode whitespace characters (em-space, en-space, thin 
space, nbsp, etc) which aren't included.

Original comment by tfmorris on 14 Aug 2012 at 8:03

GoogleCodeExporter commented 8 years ago
Updated to use the Guava method CharMatcher.WHITESPACE.trimFrom(s) in r2528.  
Still needs tests.

Original comment by tfmorris on 14 Aug 2012 at 8:11

GoogleCodeExporter commented 8 years ago
I've added some basic tests.  The two characters that the Guava method doesn't 
appear to handle are 0-width NBSP "\uFEFF" and Word Join "\u2060".  I'm not 
going rely on Guava for now rather than trying to work around this.

Original comment by tfmorris on 14 Aug 2012 at 11:03

GoogleCodeExporter commented 8 years ago

Original comment by tfmorris on 18 Sep 2012 at 3:05