christiangalsterer / httpbeat

Elastic Beat to call HTTP endpoints
Apache License 2.0
107 stars 40 forks source link

Making the template work for kibana #25

Closed carlosrodfern closed 6 years ago

carlosrodfern commented 6 years ago

I’m currently pulling metrics with httpbeat and sending them to Elasticsearch and visualizing it with kibana. I had issues getting data with kibana to display in charts because the httpbeat is using nested for the actual data type. Kibana doesn’t work great with nested objects (https://github.com/elastic/kibana/issues/1084). I solved the problem by changing the template to use object type instead. Taking into account that probably most people will use httpbeat with kibana, wouldn’t it give a better experience out of the box if by default the template used object as the type for the data? If you like the idea I can send you a PR. Thank you for httpbeat! Great job!

christiangalsterer commented 6 years ago

I originally had used object but AFAIR there was an issue, that the payload was than only text chunk which is an issue of the payload was JSON (I might be wrong here) and will check again.

As httpbeat will be a core feature in Beats 6.0 and there was a a similar discussion in https://github.com/elastic/beats/pull/5258 on using also object for HTTP headers instead of nested, I would be good if @ruflin can share his view here.

ruflin commented 6 years ago

@carofe82 @christiangalsterer The initial reason I changed from nested to object is because nested fields does not with the new index sorting feature in elaticsearch. But also it was the only place in all our beats that we used nested fields and in general we stay away from it as it is hard to do reasonable queries.

@christiangalsterer Would be great to know what the issue was :-)

christiangalsterer commented 6 years ago

@ruflin Thanks for your feedback. I will try to recall/reproduce what the problem was. But overall seems that switching back to object seems the way to go.

christiangalsterer commented 6 years ago

I just re-read the documentation of nested and object data type, I think I recall now why I had opted for nested.

AFAIR the main reason was, that with object the relationship of arrays is lost. In many cases this might be not a problem but maybe in some is.

@ruflin: You mentioned that nested is not used in any other beat, which gives some indication that this seems to be not an issue to loose the relationship in arrays. Do you know by chance if there are some open/closed issues where if was requested to support nested. If not then I would change back to object for the payload and headers.

christiangalsterer commented 6 years ago

Just tried to switch to object and get the following error when changing the type to object in the fields.yml file.

# Generate index templates
. /Users/cg/IdeaProjects/go/src/github.com/christiangalsterer/httpbeat/build/python-env/bin/activate && python ./vendor/github.com/elastic/beats/libbeat/scripts/generate_template.py --es2x /Users/cg/IdeaProjects/go/src/github.com/christiangalsterer/httpbeat httpbeat ./vendor/github.com/elastic/beats
Traceback (most recent call last):
  File "./vendor/github.com/elastic/beats/libbeat/scripts/generate_template.py", line 374, in <module>
    fields_to_es_template(args, fields, output, args.beatname + "-*", version_data['version'])
  File "./vendor/github.com/elastic/beats/libbeat/scripts/generate_template.py", line 103, in fields_to_es_template
    defaults, "")
  File "./vendor/github.com/elastic/beats/libbeat/scripts/generate_template.py", line 173, in fill_section_properties
    prop, dynamic = fill_field_properties(args, field, defaults, path)
  File "./vendor/github.com/elastic/beats/libbeat/scripts/generate_template.py", line 306, in fill_field_properties
    prop, dynamic = fill_section_properties(args, field, defaults, path)
  File "./vendor/github.com/elastic/beats/libbeat/scripts/generate_template.py", line 173, in fill_section_properties
    prop, dynamic = fill_field_properties(args, field, defaults, path)
  File "./vendor/github.com/elastic/beats/libbeat/scripts/generate_template.py", line 331, in fill_field_properties
    raise ValueError("Unknown type found: " + field.get("type"))
ValueError: Unknown type found: object
make: *** [update] Error 1

If I don't define any type it works, but then the index files, etc. are not correctly generated as then keyword is used.

I'm currently using Beats 5.6.2. Does this ring any bell?

ruflin commented 6 years ago

I think so far we never had a request to support nested fields (I could be wrong). For the headers and the relations I'm not sure I can follow. I think headers are key/value pairs? Can you provide an example on what you mean loosing relations?

For the above failure: We did quite a few changes for 6.0 to enhance the fields.yml and port it to Golang. I think what you need to use with 5.6 is dict-type: https://github.com/elastic/beats/blob/5.6/libbeat/_meta/fields.common.yml#L44

carlosrodfern commented 6 years ago

@ruflin, an example of loosing relations when using object instead of nested type

{
  myarray: [
    { item1: "a", item2: "b" },
    { item1: "c", item2: "d" }
  ]
}

If that's an object, in the index it will become:

myarray.item1: ["a", "c"]
myarray.item2: ["b", "d"]

So you won't be able to get accurate results if you query for documents that contains, for example, elements inside myarray with item1 = "a" and item2 = "d". That query will return the example document with object type but it won't return the example document if myarray were an array of nested type.

If that's an nested type. The index would generate three documents:

myarray: [ref to nested doc 1, ref to nested doc 2]

nested doc 1 would have

item1: "a"
item2: "b"

nested doc 2 would have

item1: "d"
item2: "c"

@christiangalsterer, I know you will loose the relationship but I think that requirement may not be as common as wanting to have objects to make it work out of the box with kibana. If nested type were a requirement for the user, he has the option of editing the httpbeat.template.json. In my first post I included a link to an issue requesting for support of nested objects in Kibana. No progress so far from Elastic Co.

ruflin commented 6 years ago

@carofe82 Perhaps I was a bit too focused on the header case as this is what I changed in the http module in metricbeat. For http headers I think there should not be any nested object but I would have to check the RFC to verify that.

Back to your example above: In case of your example you are right. So far in beats we always found a way to "circumvent" the issue and model the data in query friendlier way.

christiangalsterer commented 6 years ago

Thanks for the feedbacks. I'm still struggling with changing back to object due to errors I get when running make update and that "object is an unknown type" occurs. After re-reading https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html and https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html

I think it should work but I get the error message as stated in a previous comment. For me it looks like that there is a disconnect between the documentation and `generate_template.py.

If I leave out the type information in the fields.yml file i get "type": "keyword" in the generated template files.

@ruflin / @carofe82 : If you have an idea how to fix this, this would be highly appreciated.

carlosrodfern commented 6 years ago

@christiangalsterer, I noticed that in packetbeat for http based on 5.5.2 (https://github.com/elastic/beats/blob/v5.5.2/packetbeat/protos/http/_meta/fields.yml) they have for the headers:

- name: headers
   type: dict
   dict-type: keyword

... for the body in the request:

            - name: body
              type: text
              description: The body of the HTTP request.

... and for the body in the response (no type at all):

 - name: body
    description: The body of the HTTP response.

but in 6.0.0-rc1(https://github.com/elastic/beats/blob/v6.0.0-rc1/packetbeat/protos/http/_meta/fields.yml) they have in the headers for the requests:

            - name: headers
              type: object
              object_type: keyword

... and the same as in 5.5.2 for the body request and body response.

Not sure if this helps.

ruflin commented 6 years ago

We have quite some changes in 6.0 to cleanup the templates. So object is not available in 5.x and you need to use dict. fields.yml does not map everything to the template as we only mapped the once we actually needed.

christiangalsterer commented 6 years ago

Thanks for the update. Maybe the documentation in https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html and https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-field-mapping.html should be updated as the both describe the usage of object.

As there is no support for object but the only other way for structured JSON is nested I currently think we need to stick with nested for the time being.

christiangalsterer commented 6 years ago

I tried to use

- name: headers
  type: dict 
  dict-type: keyword

as in https://github.com/elastic/beats/blob/5.6/packetbeat/_meta/fields.yml and in the example above.

Now headers are not included anymore in the generated template files but also no error message. When looking at generate_template.py it seems that only text and long are supported for dict-type.

ruflin commented 6 years ago

For your first comment: Historically our fields.yml file just started with very few fields and was never thought as a 1-1 mapping of elasticsearch fields docs. We only had it for internal use. But since then it grew quite a bit and it supports now a large set of the mappings in ES and because of some potential confusion we also started to clean it up for 6.0. So 5.x is still a bit messy. Also there isn't any real documentation for our fields.yml yet :-(

I assume the reason we didn't put the dict-type: keyword into the template is because by default we set all fields to keyword. So in the above case they "should" be set to keyword automatically.

christiangalsterer commented 6 years ago

Thanks for the clarification. Unfortunately there is no "else" which would add keyword automatically as the default. Will check how to semi-manually create the respective templates.

CorrectHorseBatteryStapple commented 6 years ago

Thanks for this awesome plugin, I use it for monitoring spring /health and /metrics data in kibana (v5.3.2), but it is challenge to setup initially - I could see all data in "Discover" view, but switching to "Visualise" view and filtering data it would not display anything from variables > 2 levels deep (i.e. 'response.statusCode' is ok and response.jsonbody.heap would display "No results found", althought I've seen data in Discover)

Big thanks to @carofe82, after replacing 'nested' with 'object' in index template I can see filtered data correctly.

RedCloudDC commented 6 years ago

we have a rest endpoint which has hundreds of fields as nested json. I need to use this data in kibana. met with engineers a elasticonf they recommended httpbeat but looks like it's not ready for our use case above. we also have same live data in s3 bucket so we'll just point ELK to our s3 bucket.

carlosrodfern commented 6 years ago

@RedCloudDC , Here is an example of what I did: https://github.com/carofe82/httpbeat-example Pretty much, grab the httpbeat.template.json file from this repository, replace all nested for object, and then tell httpbeat to use it.

christiangalsterer commented 6 years ago

Just released 4.1.0 which contains the necessary fixes.

carlosrodfern commented 6 years ago

Thank you! @christangalsterer I'll be trying the new version shortly :)