christiangalsterer / httpbeat

Elastic Beat to call HTTP endpoints
Apache License 2.0
107 stars 40 forks source link

JSON #8

Closed swizzley closed 7 years ago

swizzley commented 8 years ago

I suppose this is more of a feature request or I guess it could be considered documentation request if this already exists... but I'd like to be able to use your beat to input JSON arrays, for whatever reason my body message comes in as a JSON string instead of actually getting unmarshalled as JSON, this appears to be an issue with elasticsearch rather than your beat since I have to wrap my json array in an object before I can even curl it into ES manually. So ideally adding a "field" under JSONBODY would be the ideal way to do this, or perhaps that already exists and I'm just not doing it right, either way any help would be appreciated. I'm more than happy to send you a pull request if I can understand where / how this is being done.

christiangalsterer commented 8 years ago

Hi @swizzley,

I think what you are looking for is available with version 1.1.0.

If the HTTP endpoint returns a proper JSON structure it is also added in the field jsonBody (see https://github.com/christiangalsterer/httpbeat/blob/master/docs/fields.asciidoc). You can also modify if "dots" shall be replaced in the structure and if the structure shall be flattened, see https://github.com/christiangalsterer/httpbeat/blob/master/docs/configuration.asciidoc

swizzley commented 8 years ago

yeah my problem is that it is returning a json array therefore it prints the json as a string, so the response field has this massive json blob as its' body. Additionally I don't want to create unique IDs in elastic search for each poll interval, I just want the documents inside the json poll to derive their ID from a given field and then simply update that document on subsequent polls. I've figured out how to do this with a for loop to derive the _id then select the document from the response with a lil jq before adding the metadata before sending in _bulk, but ideally I'd like to integrate these methods into your beat , what do you think?

rompic commented 7 years ago

maybe an option to deactivate the body field in the response would be an option. I just ran into this limitation with a very big response (using graylog as a target for the logstash output): message [java.lang.IllegalArgumentException: Document contains at least one immense term in field="beat_response_body" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '...', original message: bytes can be at most 32766 in length; got 167036]

worked after deactivating the body manually and recompiling.

http://stackoverflow.com/questions/24019868/utf8-encoding-is-longer-than-the-max-length-32766

rompic commented 7 years ago

We actually could set it to nil if unmarshalling succeeds.

christiangalsterer commented 7 years ago

Hi everybody, thank you for all your feedback. I was also wondering if I should not slightly change the behaviour in a way that you either return the jsonBody or body field.

Here I see the options:

  1. Always return both fields
  2. Set body to nil, if unmarshalling to json is successful
  3. Users need to specify the return format. body field is then either string or json.

Would be great to get some feedback on your preferences.

rompic commented 7 years ago

Thanks for the fast response. I personally don't see any value in 1. as the same information is sent twice. I'm actually not sure why anyone would want to send json info as a string, but having an option (3) could also be used to not try to unmarshal non-json output in the first place and could be a minor performance improvement. Due to the fact that already a json related setting exists (dot mode) I would vote vor 3.

christiangalsterer commented 7 years ago

Released with version 3.0.0.

There is now a new "output_format" parameter which allows to specify the format of the response body. If it is not set or "string" then the body is returned as string in the "body" field. If set to "json" then the response body is returned as json in the "jsonBody" field.