AgileWorksOrg / elasticsearch-river-csv

CSV river for ElasticSearch
Apache License 2.0
91 stars 45 forks source link

CSV Not importing on server #12

Closed btray77 closed 10 years ago

btray77 commented 10 years ago

I'm on Ubuntu 12.04.4 LTS

Updated the system

sudo apt-get update && sudo apt-get upgrade

I installed elasticsearch using the debian file.

It seems to be installed correctly.

elasticsearch.yml file

changed cluster.name changed node.name

I built river-csv with:

Clone elasticsearch-river-csv-source: git clone https://github.com/xxBedy/elasticsearch-river-csv.git

Installed maven

apt-get install maven

Ran maven

mvn clean package

Installed plugin

bash /usr/share/elasticsearch/bin/plugin -url file:/path_to_csv_river_repository/target/release/elasticsearch-river-csv.zip -install elasticsearch-river-csv

checked if plugin installed correctly

bash plugin -l Installed plugins:

elasticsearch-river-csv-2.0.0.jar groovy-all-2.2.1.jar opencsv-2.3.jar

Location of above

/usr/share/elasticsearch/plugins/river-csv

the test csv file is located at /root/csv/demofile.csv

Restarted elastic search

service elasticsearch reboot

Ran

curl -XPUT localhost:9200/_river/cj/_meta -d ' { "type" : "csv", "csv_file" : { "folder" : "/root/csv", "filename_pattern" : ".*.csv$", "poll":"5m", "first_line_is_header":"true" } }'

It works on my local machine (osx) but not on the server.

Ran

curl -XGET "http://localhost:9200/_search" -d' { "query": { "match_all": {} } }'

Output

{"took":3,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"_river","_type":"cj","_id":"_meta","_score":1.0, "_source" : { "type" : "csv", "csv_file" : { "folder" : "/root/csv", "filename_pattern" : ".*.csv$", "poll":"5m", "first_line_is_header":"true" }, "index" : { "index" : "my_index", "type" : "item", "bulk_size" : 100, "bulk_threshold" : 10 } }},{"_index":"_river","_type":"cj","_id":"_status","_score":1.0, "_source" : {"node":{"id":"6cl4wAj4TcWE0UdN2ayhRg","name":"Sunstreak","transport_address":"inet[/IPADDRESSISHERE:9300]"}}}]}}

My thoughts

I don't think it's seeing the csv file. I thought it might be a permissions issue, so I chmod 777 the csv file.

It's on a digitalocean server with 1gb ram. The CSV file is 10mb. I've tried with smaller csv files that only have a few lines, but same issue.

Log file

Nothing intresting going on inside it.

vtajzich commented 10 years ago

Hi,

I've installed ubuntu server to Parallels to test your issue. I didn't changed cluster.name and node.name. Also command to restart elastic search is:

service elastic search restart

instead of

service elasticsearch reboot

For test purposes I've used

test_1_id_column.csv

cdv from test sources. Copied it to /root/csv and run following put:

curl -XPUT localhost:9200/_river/cj/_meta -d '
{
"type" : "csv",
"csv_file" : {
"folder" : "/root/csv",
"first_line_is_header":"true"
} 
}'

Your command failed with error:

{"error":"MapperParsingException[failed to parse]; nested: JsonParseException[Unrecognized character escape '.' (code 46)\n at [Source: [B@7af8c64f; line: 6, column: 27]]; ","status":400}

this line

"filename_pattern" : ".*\.csv$",

should be

"filename_pattern" : ".*\\.csv$",

Everything works correctly.

Regards,

Vitek

btray77 commented 10 years ago

Thank you for your reply. I don't know why I wrote reboot.. :P I deleted _all restarted the service

I used your example without the file pattern. I think mine got stripped by github, as the correct file patern is on my laptop.

curl -XPUT localhost:9200/_river/cj/_meta -d '
{
"type" : "csv",
"csv_file" : {
"folder" : "/root/csv",
"first_line_is_header":"true"
} 
}'

Here's the log file. When I look in the csv folder it doesn't add the .processing to the file.

[2014-03-24 05:00:32,400][INFO ][cluster.metadata         ] [Perseus] [_river] creating index, cause [auto(index api)], shards [1]/[1], mappings []
[2014-03-24 05:00:32,528][INFO ][cluster.metadata         ] [Perseus] [_river] update_mapping [cj] (dynamic)
[2014-03-24 05:00:32,538][INFO ][river.csv                ] [Perseus] [csv][cj] starting csv stream
[2014-03-24 05:00:32,544][INFO ][river.csv                ] [Perseus] [csv][cj] Using configuration: org.elasticsearch.river.csv.Configuration(/root/csv, .*\.csv$, true, [], 1h, cj, csv_type, 100, \, ", ,, 0, 10, id)
[2014-03-24 05:00:32,545][INFO ][river.csv                ] [Perseus] [csv][cj] Going to process files {}
[2014-03-24 05:00:32,545][INFO ][river.csv                ] [Perseus] [csv][cj] next run waiting for 1h
[2014-03-24 05:00:32,550][INFO ][cluster.metadata         ] [Perseus] [_river] update_mapping [cj] (dynamic)
vtajzich commented 10 years ago

i've checked @btray77 's server and it's just a permissions related issue. Nothing related to our code. Will leave it open until @btray77 said that he has solved it.

btray77 commented 10 years ago

I reinstalled elasticsearch from source code, and no more permission related issues. I'm sure I could have messed with file permissions and ownership but that was over my head.
Awesome plugin, even more awesome support by Vitek!