Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.4k stars 1.06k forks source link

Input type JSON path from HTTP API doesn't store valid json when specifying JSON path of data to extract #13344

Open drewmiranda-gl opened 2 years ago

drewmiranda-gl commented 2 years ago

When using the input type JSON path from HTTP API, you are required to provide a value for JSON path of data to extract.

However, any value placed in this field other than * returns invalid json that cannot be parsed.

Expected Behavior

Json selected using JSON path returns valid json that can be parsed using json parsing functions via processing pipelines.

Current Behavior

Attempting to parse json returned produces an error:

Unable to parse JSON
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('r' (code 114)): was expecting double-quote to start field name
 at [Source: (String)"{results=[{"gender":"female","name":{"title":"Mrs","first":"Margrit","last":"Stapf"},"location":{"street":{"number":726,"name":"Drosselweg"},"city":"Brandis","state":"Mecklenburg-Vorpommern","country":"Germany","postcode":22688,"coordinates":{"latitude":"-49.5036","longitude":"92.7654"},"timezone":{"offset":"+5:00","description":"Ekaterinburg, Islamabad, Karachi, Tashkent"}},"email":"margrit.stapf@example.com","login":{"uuid":"ad9b1327-232d-430b-9468-7d8cb653db69","username":"lazykoala841","pass"[truncated 698 chars]; line: 1, column: 3]
    at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804) ~[graylog.jar:?]
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:693) ~[graylog.jar:?]
    at com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:591) ~[graylog.jar

Possible Solution

Steps to Reproduce (for bugs)

  1. Create a JSON path from HTTP API input
  2. Set URL to https://randomuser.me/api/
  3. Set json path to $.results.0 (this can be validated as valid json path via https://jsonpath.com/ )
  4. Observe that data in the result field is not valid json
{gender=male, name={title=Mr, first=Eeli, last=Wiita}, location={street={number=1553, name=Reijolankatu}, city=Oulu, state=Satakunta, country=Finland, postcode=10824, coordinates={latitude=-25.5239, longitude=-99.0959}, timezone={offset=+6:00, description=Almaty, Dhaka, Colombo}}, email=eeli.wiita@example.com, login={uuid=2fc2a9a0-10cc-4164-bcd6-36240d477367, username=ticklishgorilla304, password=scarlett, salt=gcWXeBtT, md5=a2b96a058776711a7e7a4c8567cee0f8, sha1=10afffcd61b2650762a265dd36ac460b53057aab, sha256=c745e81c83d6032d3cc11f2c8021b59741d27cd89ac1407f66ab9df87ce08d0b}, dob={date=1966-09-05T18:33:04.100Z, age=55}, registered={date=2005-06-15T20:27:05.149Z, age=17}, phone=02-824-424, cell=044-949-75-71, id={name=HETU, value=NaNNA677undefined}, picture={large=https://randomuser.me/api/portraits/men/31.jpg, medium=https://randomuser.me/api/portraits/med/men/31.jpg, thumbnail=https://randomuser.me/api/portraits/thumb/men/31.jpg}, nat=FI}

Your Environment

Please let me know if there are any questions or additional information needed.

bernd commented 2 years ago

The "JSON path from HTTP API" input was designed to store a single primitive value, not a complete JSON payload.

If the value returned by the JSONPath setting is a nested JSON structure, the codec stores a string representation of the Java Map structure in the result field of the message. The string representation of a Java map is not valid JSON.

We cannot change the current behavior because a change can break existing setups. We could introduce an input configuration setting to store map payloads as JSON, but I think it would be better to create a new input type that is designed to fetch and process JSON payloads.

The current input was basically designed to fetch download counts from GitHub. (as shown in the docs)

danotorrey commented 2 years ago

@bernd Thanks for the background info. Are you thinking that the separate input would also support doing more extended JSON processing (above what the current one does)?

If it's mostly the same (with added JSON element targeting), then the configuration option approach for this input seems like an OK backwards-compatible solution to me, since the Java map representation seems like a bug/unintended side-effect, and it would provide a workaround to get a predicable format with this input. Seems easy enough to do from a code-perspective. I know the original intention of the input was different, but this seems to still fit with JSON path functionality.

One of the main benefits is that the json_flatten pipeline function could then be used to add fields for a specific JSON element (instead of the whole document).

But, I also think that if we have planned more comprehensive JSON processing support, then a separate input would make sense.

bernd commented 2 years ago

But, I also think that if we have planned more comprehensive JSON processing support, then a separate input would make sense.

@danotorrey I don't think we have anything explicitly planned yet. I was thinking about having more flexibility regarding the configuration.

A few examples I had in mind:

But you are right; we could add an option to enable the "JSON value mode" and adjust the map and list handling depending on the value of that option.

danotorrey commented 2 years ago

@bernd Good point, there are quite a few limitations with this input. A new input addressing these would be a lot more usable.

Do you have any gut feeling for what to do about this issue? My vote is to consider adjusting the display name for the input to "JSON path value from HTTP API" and add a small clarifying note to the documentation about only supporting primitive values. Then, save the effort of more extensive JSON handling improvements for the new input.

bernd commented 2 years ago

@danotorrey I think you should talk to product and check the priority and next steps.

boosty commented 2 years ago

My vote is to consider adjusting the display name for the input to "JSON path value from HTTP API" and add a small clarifying note to the documentation about only supporting primitive values.

Sounds like a good first step 👍 Could the SecDev team drive this?

Then, save the effort of more extensive JSON handling improvements for the new input.

I cannot say much about the priority for that right now. It sounds like a useful addition though.

danotorrey commented 2 years ago

@boosty

Sounds like a good first step 👍 Could the SecDev team drive this?

Definitely. PR has been opened and will be backported too: https://github.com/Graylog2/graylog2-server/pull/13439 I will coordinate the docs changes too.

I cannot say much about the priority for that right now. It sounds like a useful addition though.

If it makes sense, we can keep this issue open for future consideration.