doriordan / skuber

A Scala Kubernetes client library
Apache License 2.0
326 stars 97 forks source link

Better parse error handling #270

Open jroper opened 5 years ago

jroper commented 5 years ago

Working with custom resources in Skuber is very difficult due to parse errors being fatal. Skuber assumes that parsing will always be successful, and doesn't offer any opportunity to handle parse errors gracefully. So, if you're not using validation in your CRDs (or if you don't have the validation quite right), then an end user updating a malformed resource will result in the entire operator, for both good and the bad resource, from ceasing to work - any events that are received for that resource will cause the watch source to fail, and any list operations for that type of resource will fail.

Here's what I would suggest should be modified to make writing robust operators much simpler:

With the following modifications (or equivalent), I think it will be much easier to write operators using skuber.

doriordan commented 5 years ago

Re. WatchEvent - given that WatchEvent has a non-optional field for the successfully parsed resource, to keep backwards compatibility something else would need to be used to convey the parse error information to the operator, for example we could use the error sink suggested for #269. In the case of your suggestion for adding error details to ResourceList it seems feasible and useful, but I would need to look into the details of what would be involved given current list parsing in Skuber, which leans on the implicit Play readers for lists.

jroper commented 5 years ago

Yeah, Plays traversableReads definitely couldn't be used. As a work around now I'm using CustomResource[JsValue, JsValue] in the API, then I convert the resource back to a JsValue, and then decode using my defined CustomResource[MySpec, MyStatus]. Generalising that might look like this:

def listReadsCollectingErrors[Sp: Reads, St: Reads]: Reads[(List[(CustomResource[JsValue, JsValue], JsError)], List[CustomResource[Sp, St])] = {
  implicitly[Reads[List[CustomResource[JsValue, JsValue]]]].map { resources =>
    val results = resources.map { jsResource =>
      Json.toJson(jsResource).validate[CustomResource[Sp, St]] match {
        case JsSuccess(resource) => Right(resource)
        case err: JsError => Left((jsResource, err))
      }
    }
    (jsResults.collect { case Left(err) => err }, jsResults.collect { case Right(resource) => resource })
  }
}

The reason that the it's a list of (CustomResource[JsValue, JsValue], JsError), and not just the error, is so that you can know which resource has the error, and update the status accordingly.

As for WatchEvent, one problem with using a separate Sink is that if you do then go and modify it to set the error in the status, that will generate another watch event. Generally when implementing operators, you need to implement something stateful that caches both values you've seen, and values you've updated, so that you can distinguish watch events that you generate because you updated the status (and thus avoid infinite update loops) from watch events that something else generated and you need to handle. This is relatively straight forward to do in a single stream, using Flow.scan you can fold over the cache, but if you introduce a separate stream for error handling, then your cache needs to coordinate between the streams, which brings up concurrency and thread safety issues.

doriordan commented 5 years ago

I am coming around to a preference for adding an enriched event type that can represent either errors or normal events. That would require new API watch methods that return sources of these new event types and more than likely deprecating the old watch methods, I'll need to flesh that out.