ddavisqa / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

TsvCsvImporter used for import when split into columns is not checked #346

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Uncheck 'split into columns' and 'auto detect data types'
2. Attempt to import apache log file
3.

What is the expected output? What do you see instead?
Out of heap space error

What version of Google Refine are you using?
Version 2.0 [r1836]

What operating system and browser are you using?
OS X 10.6 with Chrome

Is this problem specific to the type of browser you're using or it happens
in all the browsers you tried?
All browsers

Please provide any additional information below.

I'm trying to import an apache log file with ~3.4m rows, I get a out of heap 
space error (I've upped the memory but this is still kind of expected).

The reason why I'm posting here is that I unchecked split into columns but 
TsvCsvImporter still appears in the stacktrace. Since I specifically said don't 
split into columns I'm thinking that this class shouldn't be used to import 
this file (or maybe the name of this class is not indicative of it's use). 
Issue 242 confirms a bug with large CSV datasets 
http://code.google.com/p/google-refine/issues/detail?id=242#c14

Here's the top part of my stack trace

java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.<init>(String.java:215)
    at java.io.BufferedReader.readLine(BufferedReader.java:331)
    at java.io.LineNumberReader.readLine(LineNumberReader.java:182)
    at com.google.refine.importers.TsvCsvImporter.read(TsvCsvImporter.java:119)
    at com.google.refine.importers.TsvCsvImporter.read(TsvCsvImporter.java:74)

Original issue reported on code.google.com by random35...@googlemail.com on 11 Mar 2011 at 7:29

GoogleCodeExporter commented 9 years ago
Thanks for reporting this, but the class TsvCsvImporter currently also handles 
the case where you don't have the "split into columns" option checked. As you 
phrased it, "the name of this class is not indicative of its use". I'm going to 
close this issue as it's the same as issue 242.

Original comment by dfhu...@gmail.com on 11 Mar 2011 at 7:48