KaranMuthusamy commented 9 years ago

Hello, I am trying to set up and pull JSON data from S3 to Logstash in to ElasticSearch with split filter and I have some issues. The S3 data has 25 events in a single line separated by curly braces like { } and I want to use split filter to separate them during the pull from S3 using config file, I have very limited examples for split filter. can you guys help me.

Thanks Karan

KaranMuthusamy commented 9 years ago

Hi, I am new to ELK and I am using elasticsearch 1.4.1 and logstash 1.4.2. It looks like no one used the split filter option OR it does not work, I have been searching an example and I could not find on the web, can someone point some link to example. Thanks

wiibaa commented 9 years ago

Hello @KaraMuthu , can you please create a gist with an example input file.

KaranMuthusamy commented 9 years ago

I was trying different options with split and mutate as well, here is the code that I am trying on config file:

input { s3 { bucket => "XXXXXX" credentials => [ "XXXXX", "XXXXX" ] region_endpoint => "us-east-1" prefix => "Sample-Work"

keep track of the last processed file

sincedb_path => "./last-s3-file" codec => "json" } }

filter { mutate { split => ["message", "userId"]
} }

output { elasticsearch_http { codec => "json" host => "localhost" port => "9200" } }

wiibaa commented 9 years ago

I have difficulty to understand what you want to achieve with mutate split as you are already using json codec in the input. Can you please post an example of json input and what structure you want to achieve in elasticsearch

KaranMuthusamy commented 9 years ago

I have 25 events together in one JSON file on S3 buckets with timestamp and I want to use logstash to pull the data from S3 to elasticsearch and visualize using kibana. The logstash pulls the one file(25 events together) as one JSON doc instead of 25 documents. So I want to split each events (25) as each JSON doc before push into Elasticsearch.

here is sample 3 events out of 25 events from one file on S3 and If I do not split them individual events these 25 events loaded into one doc in Elasticsearch that does not help analyse my events. Or am I missing anything here. I hope this help. Thanks

{"track":{"trackId":"TR296"},"userId":"6326a7aa6267ac3e0ac691e25c82c7edb0998e90","category":"UI","label":"watch","demographic":{"gender":"female","subscriber":"0"},"action":"youTube","timestamp":"2014-12-08 19:11:21.202 GMT","environment":"prod","video":{"videoId":"mKcMASu0d3I"}}

{"userId":"guest","category":"UI","label":"login","demographic":{"subscriber":"0"},"action":"youTube","timestamp":"2014-12-08 19:11:10.672 GMT","environment":"prod"}

{"userId":"guest","category":"Stream","label":"Editor's Picks","demographic":{"subscriber":"0"},"action":"timeOnStream","timestamp":"2014-12-08 19:11:23.093 GMT","environment":"prod","value":169.1780970096588}

jsvd commented 9 years ago

The first event has a different structure from the second two, is it a copy paste error? What is the structure of the JSON in the s3 file?

KaranMuthusamy commented 9 years ago

Hi Jsvd, Thanks for your question and yes the structure is different, but it is not necessarily that all S3 docs are in the same structure of fields. it is kind of unstructured JSON data and here are few more example:

{"track":{"trackAccessMask":"1","trackId":"TR156","trackForm":"0"},"userId":"5c3771659c6cfd9bbfcfdf3ea7997c30de4c5f36","category":"Accounting","label":"","demographic":{"subscriber":"0"},"action":"preview","timestamp":"2014-12-07 02:36:15.322 GMT","environment":"prod","audioFX":{"fxAccessMask":"0"}}

{"track":{"trackAccessMask":"1","trackId":"TR326","trackForm":"0"},"userId":"6fdc891d0c9fa0a55ceca28a3c790aa4e6aac80c","category":"Accounting","label":"","demographic":{"subscriber":"0"},"action":"record","timestamp":"2014-12-07 02:36:15.709 GMT","environment":"prod","audioFX":{"fxAccessMask":"0"}}

{"track":{"trackId":"TR274"},"userId":"cc1365f99aa7f1709f159c2e5ca52ebca77d9ce6","category":"UI","label":"watch","demographic":{"gender":"female","subscriber":"0"},"action":"music","timestamp":"2014-12-07 02:36:23.419 GMT","environment":"prod","video":{"videoId":"DzBblj5yMJ4"}}

{"userId":"febc06dc7a7bc4bd83a4d969b5223b0d5853e5ac","category":"Stream","label":"Editor's Picks","demographic":{"gender":"male","subscriber":"0"},"action":"timeOnStream","timestamp":"2014-12-07 02:36:04.362 GMT","environment":"prod","value":11.6377249956131}

{"userId":"fec37b1b27fb299f5f8da8ab0c16d42d40eaa79e","category":"Networking","label":"amazonAudioUploadCompleted","demographic":{"subscriber":"0"},"action":"upload","timestamp":"2014-12-07 02:36:12.527 GMT","environment":"prod"}

jsvd commented 9 years ago

Each line should generate a separate event, so 25 lines => 25 events. Are you getting one big event? In the file is each line separated by two "\n" newline characters?

KaranMuthusamy commented 9 years ago

Each S3 file has 25 such events. Yes, I am getting one big event on the elasticsearch, that is why I want to use split filter on the logstash config file to split 25 different events.

For the readability of the events, I manually separated on my above post with 1 or 2 "\n" newline characters.

I tried with mutate/split filters to insert new line characters "\n" between JSON docs, and could not make it work and I want to find out anyone used those filters to separate events using filters.

jsvd commented 9 years ago

ahh ok so the file is just 1 big line with the 25 events:

{"track":{"trackAccessMask":"1","trackId":"TR156","trackForm":"0"},"userId":"5c3771659c6cfd9bbfcfdf3ea7997c30de4c5f36","category":"Accounting","label":"","demographic":{"subscriber":"0"},"action":"preview","timestamp":"2014-12-07 02:36:15.322 GMT","environment":"prod","audioFX":{"fxAccessMask":"0"}},{"track":{"trackAccessMask":"1","trackId":"TR326","trackForm":"0"},"userId":"6fdc891d0c9fa0a55ceca28a3c790aa4e6aac80c","category":"Accounting","label":"","demographic":{"subscriber":"0"},"action":"record","timestamp":"2014-12-07 02:36:15.709 GMT","environment":"prod","audioFX":{"fxAccessMask":"0"}},{"track":{"trackId":"TR274"},"userId":"cc1365f99aa7f1709f159c2e5ca52ebca77d9ce6","category":"UI","label":"watch","demographic":{"gender":"female","subscriber":"0"},"action":"music","timestamp":"2014-12-07 02:36:23.419 GMT","environment":"prod","video":{"videoId":"DzBblj5yMJ4"}}

I'm assuming you can't force the system that writes the file there to add a comma or "\n" at the end of each event ? :)

KaranMuthusamy commented 9 years ago

Yes, you are right, there are 25 events getting loaded as single event on Elasticsearch and on data we do not even have "," to separate the events.

jsvd commented 9 years ago

assuming the events are "{event1}{event2}{event3}" you can do this:

input { stdin {} }
filter{
  mutate { gsub => [ "message", "}{", "}\n{" ] }
  split { terminator => "\n" }
}
output { stdout {} }

this way you insert a newline between }{ and then split by the new line.

you will still read all the file into memory in 1 pass, so be careful at the size of the file.

KaranMuthusamy commented 9 years ago

Hi jsvd, Thanks very much, and it worked very well for splitting the events . But another issue was all the events were put in one column under message "{event1}{event2}{event3}", instead of each events under its own key and value .(may be I am missing something here I guess), also If anyone come across such issue please advise to fix conf file and I appreciate the help. Thanks

For example: this is sample data {"track":{"trackAccessMask":"1","trackId":"TR156","trackForm":"0"},"userId":"5c3771659c6cfd9bb"}

the expectation is as below on elasticsearch head trackAccessMask - trackId - trackForm - userId 1 - TR156 - 0 - 5c3771659c6cfd9bb

but getting as follows on elasticsearch head: (the whole document is under one column message as a raw text) message {"trackAccessMask":"1","trackId":"TR156","trackForm":"0","userId":"5c3771659c6cfd9bb"}

Here is the details of conf file input { s3 { bucket => "XXXXXX" credentials => [ "XXXXX", "XXXXX" ] region_endpoint => "us-east-1" prefix => "Sample-Work" codec => "json" } }

filter{ mutate { gsub => [ "message", "}{", "}\n{" ] } split { terminator => "\n" } }

output { elasticsearch_http { codec => "json" host => "localhost" port => "9200" } }

KaranMuthusamy commented 9 years ago

Hello, Both input and output has codec => "JSON", but still I am getting as single string for a event instead of key value pair on the elasticsearch. Can someone please help on the above issue.

Thanks

kkirsche commented 9 years ago

What are you getting from codec => json and what do you get if you just use codec => plain?

Also, to help us with reading those blocks, I'd recommend doing 3 `s followed by the language, e.g.

```javascript
{"track":{"trackAccessMask":"1","trackId":"TR326","trackForm":"0"},"userId":"6fdc891d0c9fa0a55ceca28a3c790aa4e6aac80c","category":"Accounting","label":"","demographic":{"subscriber":"0"},"action":"record","timestamp":"2014-12-07 02:36:15.709 GMT","environment":"prod","audioFX":{"fxAccessMask":"0"}}

Note that I can't close the example with the 3 `s as that makes a new block.

KaranMuthusamy commented 9 years ago

kkirsche, I tried with codec => plain and getting same results,no key value pair on ES and I did not understand that you mentioned about 3 `s, can you please elaborate more on this.

Thanks Karan

kkirsche commented 9 years ago

See syntax highlighting at:

https://help.github.com/articles/github-flavored-markdown/

It's just posting code as plain text is harder to read

Please excuse brevity — Sent from my mobile device.

On Feb 4, 2015, at 9:56 PM, MKA-MKA notifications@github.com wrote:

kkirsche, I tried with codec => plain and getting same results,no key value pair on ES and I did not understand that you mentioned about 3 `s, can you please elaborate more on this.

Thanks Karan

— Reply to this email directly or view it on GitHub.

jordansissel commented 9 years ago

For Logstash 1.5.0, we've moved all plugins to individual repositories, so I have moved this issue to https://github.com/logstash-plugins/logstash-filter-split/issues/3. Let's continue the discussion there! :)

elastic / logstash

Split filter - logstash - No example on the web - OR it does not work - need assistance..... #2345

keep track of the last processed file