ericmckean / google-refine

Automatically exported from code.google.com/p/google-refine
Other
0 stars 0 forks source link

Detecting floats and ints when a figure has a comma in it #507

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
If a figure has a comma in it, e.g. "32,645" - this is parsed as a string 
instead of an int.

I think any ints or floats containing commas in the form of a string should be 
parsed as an int/float.

(And as a side note, I've deleted my attachments in "closed" and "fixed" 
issues, but still have the "Issue attachment storage quota exceeded" message, 
preventing me from attaching any media to future issues).

Dan

Original issue reported on code.google.com by danpaulsmith on 16 Dec 2011 at 12:44

GoogleCodeExporter commented 9 years ago
Are you talking about during project import or using toNumber() ?  

The problem with a string like 32,645 is that it's ambiguous.  Commas are used 
as both thousands separators and decimal points, depending on the culture.  We 
could switch to using DecimalFormat.parse() 
http://docs.oracle.com/javase/6/docs/api/java/text/DecimalFormat.html#parse%28ja
va.lang.String,%20java.text.ParsePosition%29 to get locale-aware parsing, but 
then we'd need extra knobs and levers to control it, adding complexity.

BTW, the transform grel:value.replace(',','').toNumber() will convert numbers 
in this format if you're sure the comma is a thousands separator.

Original comment by tfmorris on 27 Dec 2011 at 9:09

GoogleCodeExporter commented 9 years ago
It seems dangerous to parse anything on import anyway, as, for example, I might 
be in the U.S. processing data files generated by people in Europe, and our 
thousand separators are different. I think Thad's suggestion for multi-column 
operation might solve this issue. After import, you can choose one or more 
columns to convert using the same transform expression. And you can leave out 
numerically-looking columns like zip code.

Original comment by dfhu...@gmail.com on 27 Dec 2011 at 10:23