Closed GoogleCodeExporter closed 8 years ago
Original comment by dfhu...@gmail.com
on 11 May 2010 at 7:07
Fixed in r717. eferonline, would you be able to check out the code and verify
the fix?
Original comment by dfhu...@gmail.com
on 12 May 2010 at 6:10
Thanks for the quick reaction!
I loaded and built the new revision. It works better now, giving me the correct
number of records. The line break in the description field however seems to be
gone.
(But I can add it manually in the gridworks editor via clipboard - so it seems
technically possible to have line breaks in fields). This should be fixed, too.
As for the source code change: I shamefully have to admit that I don't really
get the
importer code and what change exactly did the trick for this issue. I would
have
expected a kind of finite state automaton or something to manage the parser
"modes"
but could not find an equivalent in the sources. Unfortunately I'm a bit short
on
time to review the code in detail.
Original comment by eferonline
on 12 May 2010 at 8:27
PS: The fix only solves the problem if the separator chars are commas. For tabs
the old
behavior occurs.
Original comment by eferonline
on 12 May 2010 at 8:38
Attachments:
Fixed for TSV as well by r790. Please verify.
Original comment by dfhu...@gmail.com
on 17 May 2010 at 5:57
It's nearly fixed now. But if one line break in a "-escaped area comes directly
after
another (which means there is a blank line before the text continues) the
record is
still split. It should be possible to have an unlimited number of linebreaks in
the
field value before the escape sequence ends and the next field is processed.
Original comment by eferonline
on 17 May 2010 at 6:54
I've added a unit test for this multiple blank line case in r794. test fails.
Original comment by iainsproat
on 17 May 2010 at 7:04
Should be fixed in r797. Please verify.
Original comment by iainsproat
on 17 May 2010 at 12:01
Verified. It work's now as expected. Great!
Original comment by eferonline
on 17 May 2010 at 12:14
Is there any way I can work around this problem without downloading and
building Google Refine from source? Can I convert the input file to another
format or escape characters differently?
Original comment by andreas....@gmail.com
on 20 May 2011 at 8:23
Why not just use a text editor and do a find/replace for the double quote
character " to something like a triple carat ^^^ ? Import without the
splitting option or quote char option. Then once it's in Google Refine,
perform your splits manually with GREL or Add column against the commas and ^^^
? Would that work ?
Original comment by thadguidry
on 20 May 2011 at 1:19
That workaround would help for some cases with embedded tabs and commas, but
not for line breaks, I suspect.
Original comment by tfmorris
on 20 May 2011 at 5:00
Original comment by tfmorris
on 18 Sep 2012 at 2:21
Original comment by tfmorris
on 18 Sep 2012 at 2:52
Original issue reported on code.google.com by
eferonline
on 11 May 2010 at 1:09