elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
200 stars 431 forks source link

[Azure] Update sanitization logic #10089

Closed lucian-ioan closed 4 weeks ago

lucian-ioan commented 4 months ago

Recently while processing data for Azure OpenAI we have observed a new type of malformed log that is currently not being sanitized by the current sanitization implementation and is throwing the following error:

Unexpected character (''' (code 39)): was expecting double-quote to start field name\\n at [Source: (StringReader); line: 1, column: 356]

The pipeline fails entirely so this does not affect any existing log data.

lucian-ioan commented 4 weeks ago

This was already updated and extended by @zmoog in https://github.com/elastic/beats/pull/40742.

zmoog commented 4 weeks ago

@lucian-ioan, can you share a sample document for this new invalid JSON case?

lucian-ioan commented 3 weeks ago

@zmoog I wasn't able to get any malformed JSON using OpenAI alone.

Since our initial tests involved an eventhub which also had App Services, I suspect the JSON was just the usual garden-variety malformed JSON coming from there.

Because the pipeline was failing, the document with the error message had the dataset field set to OpenAI, hence the confusion on our end.