ddavisqa / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

Creation of new project from text file does not support space delimited files #249

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.Create a space delimited text file with an arbitrary number of spaces/tabs 
between fields
2.Create a new project with this file with default options

What is the expected output? What do you see instead?
I would expect to see multiple columns split on spaces.  Instead, I see a 
single column containing the entire first row.  It is possible to then split 
the data rows on a regex (\s*), but not possible to split the header.  The 
creation page also appears to not support entering a regex for splitting 
columns.

What version of the product are you using? On what operating system?
V. 2.0 on Mac OS X 10.6.4 with Chrome 7.0.517.44

Please provide any additional information below.
If you are familiar with Excel you can do the following to see the expected 
output.  Open the file and use the text_to_column feature:  Select delimited, 
split on tab, split on space and treat consecutive delimiters as one.

I was hoping to use refine to change to csv and eliminate place holder values 
of '.', '_' and sometimes '-'.

Original issue reported on code.google.com by conley...@gmail.com on 24 Nov 2010 at 8:05

Attachments:

GoogleCodeExporter commented 8 years ago
I meant to classify this as an enhancement, not a defect, but don't see a way 
to edit it.

Original comment by conley...@gmail.com on 24 Nov 2010 at 8:12

GoogleCodeExporter commented 8 years ago

Original comment by tfmorris on 24 Nov 2010 at 10:19

GoogleCodeExporter commented 8 years ago
If there's no pattern to the spaces and tabs, it'd be hard to split the 
columns. I'd recommend importing the data file without splitting into columns, 
then replace runs of spaces or tabs with some characters, e.g.,

value.replace(/\s{2,}/, '|')

Then invoke the Split into several columns command.

Original comment by dfhu...@gmail.com on 11 Sep 2011 at 5:31