confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
122 stars 1.04k forks source link

Better handling of serialisation errors #1705

Closed rmoff closed 5 years ago

rmoff commented 6 years ago

Currently KSQL will silently ignore serialisation errors (from the user's point of view - they are logged server side).

I'm was getting no results from a SELECT, and it turned out the server log is full of org.apache.kafka.common.errors.SerializationException. IMO these should be floated to the user somehow. Otherwise they will puzzled why there is no data being returned. The message could be something as simple as:

 WARN: some messages were skipped as they could not be deserialised. Please see <logfile> for details

(we'd need to think about how that works w.r.t. a clustered deployment and pointing them at the relevant server).

Another option would be to have some variable threshold, below which specific serialisation errors are passed back to the console, and above which a catch-all is shown, for example:

WARN: Failed to deserialize message offset 1 partition 1 topic flood-monitoring-059793
        KsqlJsonDeserializer : java.lang.ClassCastException: java.util.HashMap cannot be cast to java.util.List
WARN: Failed to deserialize message offset 2 partition 1 topic flood-monitoring-059793
        KsqlJsonDeserializer : java.lang.ClassCastException: java.util.HashMap cannot be cast to java.util.List
WARN: Failed to deserialize message offset 3 partition 1 topic flood-monitoring-059793
        KsqlJsonDeserializer : java.lang.ClassCastException: java.util.HashMap cannot be cast to java.util.List
WARN: More than 3 deserialisation errors occurred. Please check the server log file for full details.

I would suggest passing back to the user a message, vs the other option of just making the log more accessible. This is because if the user gets no results from a query, this could be for several reasons, and they shouldn't have to check the log "just in case" for serialisation errors. Reasons for no data include:

  1. no data in the topic
  2. offset is set to latest
  3. serialisation error
spena commented 5 years ago

@rmoff I think this issue was opened before the processing log feature was added, right? An error during the query execution is tricky to catch and send it to the user because it all happens in the background of the kafka stream.

Or what serialization errors you were talking about?

rmoff commented 5 years ago

Yes we can close this in favour of processing log.