AgileWorksOrg / elasticsearch-river-csv

CSV river for ElasticSearch
Apache License 2.0
91 stars 45 forks source link

Insertion Stops After 25000 #63

Open hari-kris opened 9 years ago

hari-kris commented 9 years ago

I am working on elasticsearch-river-csv. I was able to insert into ES only upto 25k my record size is 100k. Operating Environment Windows 8.1 ES 1.7 Log File

Going to execute new bulk composed of 1000 actions , after this it print waits for next poll time.

my query PUT /_river/my_csv_river/_meta { "type" : "csv", "csv_file" : { "folder" : "/home/hariganesh/Downloads/CSV", "filename_pattern" : ".*.csv$", "poll":"5m", "fields" : [ "column1", "column2", "column3" ], "first_line_is_header" : "false", "field_separator" : ",", "escape_character" : ";", "quote_character" : "\"", "field_id" : "id", "field_id_include" : "false", "field_timestamp" : "imported_at", "concurrent_requests" : "1", "charset" : "UTF-8", "script_before_all": "/path/to/before_all.sh", "script_after_all": "/path/to/after_all.sh", "script_before_file": "/path/to/before_file.sh", "script_after_file": "/path/to/after_file.sh" }, "index" : { "index" : "my_csv_data", "type" : "csv_type", "bulk_size" : 100, "bulk_threshold" : 10 } } Input file in csv 8848488,Harinath,"A, B, C,D" 8848489,Hari,"E,F,G,H"

Can you say what might be problem.

vtajzich commented 9 years ago

Could you provide me with your mapping and data file?

hari-kris commented 9 years ago

I ran the set of statements provided in the wiki. I can't provide the log file plus size is around a one G.B but it will have field like below.

8848488,Harinath,"A, B, C,D" 8848489,Hari,"E,F,G,H"

without header. Comma as the separator. Apologize me, I know it is difficult to stimulate the same condition but is there anything I am missing out in the query.

No of lines in my log file is 50k but the insertio stops at 25k.

Mapping file { "my_csv_data": { "mappings": { "csv_type": { "properties": { "Data": { "type": "string" }, "Filename": { "type": "string" }, "Title": { "type": "string" }, "imported_at": { "type": "date", "format": "dateOptionalTime" } } } } } }

Complete log time from which query is given [2015-09-30 10:39:57,118][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:39:57,339][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:39:57,435][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:39:57,470][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:39:57,664][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:39:57,746][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:39:57,880][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:39:58,006][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:39:58,293][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:39:58,303][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:39:59,696][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:39:59,824][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:39:59,945][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:00,674][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:00,775][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:00,831][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:01,544][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:01,597][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:01,625][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:01,659][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:02,004][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:02,049][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:02,159][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:02,253][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:02,296][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:02,403][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:02,487][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:02,491][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:02,917][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:03,253][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:03,273][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:03,354][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:03,942][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:04,106][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:04,279][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:04,423][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:04,503][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:04,574][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:04,613][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:04,740][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:04,818][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:04,940][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:05,752][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:05,796][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:05,804][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:05,881][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:06,876][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:06,938][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:06,940][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:06,949][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:06,994][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:07,065][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:07,270][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:07,403][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:07,538][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:07,407][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:07,654][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:07,760][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:08,028][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:08,081][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:08,124][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:08,241][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:08,245][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:08,420][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:08,461][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:08,683][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 500 actions [2015-09-30 10:40:08,692][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:08,723][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] File has been processed xac.csv.processing [2015-09-30 10:40:08,806][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] File xac.csv.processing, processed lines 22100 [2015-09-30 10:40:08,847][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Going to execute new bulk composed of 100 actions [2015-09-30 10:40:08,968][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:09,001][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] next run waiting for 1m [2015-09-30 10:40:09,094][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:09,208][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:40:09,370][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 100 actions [2015-09-30 10:40:09,378][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Executed bulk composed of 500 actions [2015-09-30 10:41:09,009][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] All files in folder: [xac.csv.processing.imported] [2015-09-30 10:41:09,009][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Accepted files: [] [2015-09-30 10:41:09,009][INFO ][org.agileworks.elasticsearch.river.csv.CSVRiver] [Bloodscream] [csv][my_csv_river] Using configuration: org.agileworks.elasticsearch.river.csv.Configuration(E:\DataSet\Elasticsearch\xac\Sample\, .*.csv$, false, [Filename, Title, Data], 1m, my_csv_data, csv_type, 500, ;, ", ,, 10, 4, id, false, imported_at, /path/to/before_all.sh, /path/to/after_all.sh, /path/to/before_file.sh, /path/to/after_file.sh, UTF-8)

vtajzich commented 9 years ago

I tried to reproduce the issue. So I created data file containing 23.4M lines

screen shot 2015-09-30 at 23 13 23

Result after some time:

screen shot 2015-09-30 at 23 12 48 screen shot 2015-09-30 at 23 13 11

So everything works correctly.

I did some research and have some suggestions:

I would bet that the data file you have is not correct.

vtajzich commented 9 years ago

@harinathcse do you need further help?

hari-kris commented 9 years ago

Hi @vtajzich Thanks for working on this issue. I tried to insert my data using python script used csv module. I was able to insert the all the data inside it. I used quote character as ". All the files are indexed without any problem. As you said my third column has data which spawn over multiple line, but all the data will be within double quote. In python I read all the data in csv using UTF-8 encoding. This is the only difference I did. I will try to look into your suggestions once more and let you know.

vtajzich commented 9 years ago

could you, please, try to import it w/ opencsv to see if it works as well? or provide me w/ your data file, please, so we can identify root cause.

hari-kris commented 9 years ago

Sure. I will try to find the find whether there is an issue in the data file which I have provided

vtajzich commented 8 years ago

@harinathcse any update?

hari-kris commented 8 years ago

@vtajzich With our data it stops, so we moved into python script for indexed. I tried with other data which i created using spawner, it works but with our original data it stops. Might be problem in our data i believe.

vtajzich commented 8 years ago

@harinathcse do you need further help?