Closed ghost closed 10 years ago
You are right. It hangs about 500k records. I will take a look.
On line 477456 it's being caught in this method (OpenCSV library). It still keep reading lines and not return it.
public String[] readNext() throws IOException {
String[] result = null;
do {
String nextLine = getNextLine();
if (!hasNext) {
return result; // should throw if still pending?
}
String[] r = parser.parseLineMulti(nextLine);
if (r.length > 0) {
if (result == null) {
result = r;
} else {
String[] t = new String[result.length+r.length];
System.arraycopy(result, 0, t, 0, result.length);
System.arraycopy(r, 0, t, result.length, r.length);
result = t;
}
}
} while (parser.isPending());
return result;
}
Example file has each record on multiple lines. After 477456 records (notice: not lines but records) are processed there is a line with wrong number of escape characters (").
Please, check your file.
Ok thanks I will have a look.
Hi,
I am using elasticsearch-1.3.2-1.noarch on a 2 node cluster And the ALL.zip from http://fec.gov/disclosurep/PDownload.do
And the following curl statement to upload: curl -XPUT localhost:9200/_river/my_csv_river/_meta -d ' { "type" : "csv", "csv_file" : { "folder" : "/u01/app/div/temp", "first_line_is_header":"true" }, "index" : { "index":"contributions", "bulk_size":100000, "bulk_threshold":10, "type":"csv_type" } }' The unzipped file has about 5M rows. After about 470000 stops and seems to hang. But the java process is using 1 cpu:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20180 elastics 20 0 2044m 1.0g 22m S 99.6 34.4 15:40.89 java
is this because of the analyzing of the columns? How can I improve this?
Regards Hans-Peter