AgileWorksOrg / elasticsearch-river-csv

CSV river for ElasticSearch
Apache License 2.0
91 stars 45 forks source link

Not importing data #61

Closed wirecutter313 closed 9 years ago

wirecutter313 commented 9 years ago

Running ES 1.7 and River-CSV 2.2.1

I'm trying to pull in a large number of CSV files but I'm having zero luck.

Using the command curl -XPUT localhost:9200/_river/my_csv_river/_meta -d '

{ "type" : "csv", "csv_file" : { "folder" : "/tmp/csv", "filename_pattern" : ".*.csv$", "poll":"5m", "first_line_is_header":"true" } }' {"_index":"_river","_type":"my_csv_river","_id":"_meta","_version":1,"created":true}

I see the index created in EL, but nothing happens after that. All I see in the ES log is this

[2015-08-31 08:59:20,753][INFO ][cluster.metadata ] [EL-PROD-01] [_river] update_mapping my_csv_river [2015-08-31 08:59:20,993][INFO ][cluster.metadata ] [EL-PROD-01] [_river] update_mapping my_csv_river

Ideas?

wirecutter313 commented 9 years ago

Bump?

vtajzich commented 9 years ago

could you, please: ls -l /tmp/csv

wirecutter313 commented 9 years ago

So I've tried very complex and very simple config files. Both do the same. the file permissions are wide open, and the ES app should have full access to the dir. None of the files are renamed to processing or competed. Nothing in the logs about files found. Unless there is another log I should be looking at.

admin@EL-PROD-01:/tmp/csv$ ls -la total 2553724 drwxrwxrwx 2 admin admin 135168 Aug 30 01:53 . drwxrwxrwt 14 root root 4096 Sep 2 13:17 .. -rwxrwxrwx 1 admin admin 20412 Aug 30 01:15 03-21_download.csv -rwxrwxrwx 1 admin admin 24748 Aug 30 01:15 03-22_download.csv -rwxrwxrwx 1 admin admin 19828 Aug 30 01:15 03-23_download.csv -rwxrwxrwx 1 admin admin 30398 Aug 30 01:15 03-24_download.csv -rwxrwxrwx 1 admin admin 28132 Aug 30 01:15 03-25_download.csv ...... ...... .....

/var/log/elasticsearch/el-prod.log

admin@EL-PROD-01:/tmp/cvs$ tail /var/log/elasticsearch/EL-PROD.log [2015-09-02 13:30:28,293][INFO ][cluster.metadata ] [EL-PROD-01] [[_river]] remove_mapping [[my_csv_river]] [2015-09-02 13:31:16,534][INFO ][cluster.metadata ] [EL-PROD-01] [_river] update_mapping my_csv_river [2015-09-02 13:31:16,877][INFO ][cluster.metadata ] [EL-PROD-01] [_river] update_mapping my_csv_river

vtajzich commented 9 years ago

I would guess that you have wrong file name pattern. it should be:

.*\\.csv$

you have

.*\.csv$

Try to not specify it at all.

wirecutter313 commented 9 years ago

Already pulled it out based on you're previous post.

curl -XPUT localhost:9200/_river/my_csv_river/_meta -d ' { "type" : "csv", "csv_file" : { "folder" : "/tmp/csv", "poll":"5m", "first_line_is_header":"true" } }'

Still nothing in the logs.

vtajzich commented 9 years ago

attach whole logs, please. otherwise I cannot help you.

wirecutter313 commented 9 years ago

Ya... I can email it to you, but I'm not posting it publicly.

vtajzich commented 9 years ago

ok, even I think there are no sensitive information.

v.tajzich@gmail.com

wirecutter313 commented 9 years ago

You got mail

vtajzich commented 9 years ago

the log is useless. Please do following:

wirecutter313 commented 9 years ago

Log file coming

admin@EL-PROD-01:~$ curl -XDELETE http://localhost:9200/_river/my_csv_river {"acknowledged":true} admin@EL-PROD-01:~$ sudo service elasticsearch stop

vtajzich commented 9 years ago

Note that all nodes must have access to location where CSV files are stored. Log seems to me empty still.

wirecutter313 commented 9 years ago

This is a 6 node cluster with 2 masters. Should this be NFS or something?

vtajzich commented 9 years ago

NFS sounds good. It's up to yo. Just all nodes have to have access.

wirecutter313 commented 9 years ago

NFS up, and curl repointed.. No dice

wirecutter313 commented 9 years ago

So That did work This ended up executing on the first data node in the cluster.