Closed btray77 closed 10 years ago
Hi,
I've installed ubuntu server to Parallels to test your issue. I didn't changed cluster.name and node.name. Also command to restart elastic search is:
service elastic search restart
instead of
service elasticsearch reboot
For test purposes I've used
test_1_id_column.csv
cdv from test sources. Copied it to /root/csv and run following put:
curl -XPUT localhost:9200/_river/cj/_meta -d '
{
"type" : "csv",
"csv_file" : {
"folder" : "/root/csv",
"first_line_is_header":"true"
}
}'
Your command failed with error:
{"error":"MapperParsingException[failed to parse]; nested: JsonParseException[Unrecognized character escape '.' (code 46)\n at [Source: [B@7af8c64f; line: 6, column: 27]]; ","status":400}
this line
"filename_pattern" : ".*\.csv$",
should be
"filename_pattern" : ".*\\.csv$",
Everything works correctly.
Regards,
Vitek
Thank you for your reply. I don't know why I wrote reboot.. :P I deleted _all restarted the service
I used your example without the file pattern. I think mine got stripped by github, as the correct file patern is on my laptop.
curl -XPUT localhost:9200/_river/cj/_meta -d '
{
"type" : "csv",
"csv_file" : {
"folder" : "/root/csv",
"first_line_is_header":"true"
}
}'
Here's the log file. When I look in the csv folder it doesn't add the .processing to the file.
[2014-03-24 05:00:32,400][INFO ][cluster.metadata ] [Perseus] [_river] creating index, cause [auto(index api)], shards [1]/[1], mappings []
[2014-03-24 05:00:32,528][INFO ][cluster.metadata ] [Perseus] [_river] update_mapping [cj] (dynamic)
[2014-03-24 05:00:32,538][INFO ][river.csv ] [Perseus] [csv][cj] starting csv stream
[2014-03-24 05:00:32,544][INFO ][river.csv ] [Perseus] [csv][cj] Using configuration: org.elasticsearch.river.csv.Configuration(/root/csv, .*\.csv$, true, [], 1h, cj, csv_type, 100, \, ", ,, 0, 10, id)
[2014-03-24 05:00:32,545][INFO ][river.csv ] [Perseus] [csv][cj] Going to process files {}
[2014-03-24 05:00:32,545][INFO ][river.csv ] [Perseus] [csv][cj] next run waiting for 1h
[2014-03-24 05:00:32,550][INFO ][cluster.metadata ] [Perseus] [_river] update_mapping [cj] (dynamic)
i've checked @btray77 's server and it's just a permissions related issue. Nothing related to our code. Will leave it open until @btray77 said that he has solved it.
I reinstalled elasticsearch from source code, and no more permission related issues. I'm sure I could have messed with file permissions and ownership but that was over my head.
Awesome plugin, even more awesome support by Vitek!
I'm on Ubuntu 12.04.4 LTS
Updated the system
sudo apt-get update && sudo apt-get upgrade
I installed elasticsearch using the debian file.
It seems to be installed correctly.
elasticsearch.yml file
changed cluster.name changed node.name
I built river-csv with:
Clone elasticsearch-river-csv-source: git clone https://github.com/xxBedy/elasticsearch-river-csv.git
Installed maven
apt-get install maven
Ran maven
mvn clean package
Installed plugin
bash /usr/share/elasticsearch/bin/plugin -url file:/path_to_csv_river_repository/target/release/elasticsearch-river-csv.zip -install elasticsearch-river-csv
checked if plugin installed correctly
bash plugin -l Installed plugins:
This is what it installed:
elasticsearch-river-csv-2.0.0.jar groovy-all-2.2.1.jar opencsv-2.3.jar
Location of above
/usr/share/elasticsearch/plugins/river-csv
the test csv file is located at /root/csv/demofile.csv
Restarted elastic search
service elasticsearch reboot
Ran
curl -XPUT localhost:9200/_river/cj/_meta -d ' { "type" : "csv", "csv_file" : { "folder" : "/root/csv", "filename_pattern" : ".*.csv$", "poll":"5m", "first_line_is_header":"true" } }'
It works on my local machine (osx) but not on the server.
Ran
curl -XGET "http://localhost:9200/_search" -d' { "query": { "match_all": {} } }'
Output
{"took":3,"timed_out":false,"_shards":{"total":1,"successful":1,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"_river","_type":"cj","_id":"_meta","_score":1.0, "_source" : { "type" : "csv", "csv_file" : { "folder" : "/root/csv", "filename_pattern" : ".*.csv$", "poll":"5m", "first_line_is_header":"true" }, "index" : { "index" : "my_index", "type" : "item", "bulk_size" : 100, "bulk_threshold" : 10 } }},{"_index":"_river","_type":"cj","_id":"_status","_score":1.0, "_source" : {"node":{"id":"6cl4wAj4TcWE0UdN2ayhRg","name":"Sunstreak","transport_address":"inet[/IPADDRESSISHERE:9300]"}}}]}}
My thoughts
I don't think it's seeing the csv file. I thought it might be a permissions issue, so I chmod 777 the csv file.
It's on a digitalocean server with 1gb ram. The CSV file is 10mb. I've tried with smaller csv files that only have a few lines, but same issue.
Log file
Nothing intresting going on inside it.