ImFlog / schema-registry-plugin

Gradle plugin to interact with Confluent Schema-Registry.
Apache License 2.0
111 stars 30 forks source link

Discussion: Fail-fast behaviour when registering/configuring a schema fails #186

Open filpano opened 3 months ago

filpano commented 3 months ago

I have a use case where I am programmatically providing a list of schemas to register using the registerSchemasTask and the register extension.

Essentially, I have a list of subjects as follows:

- Subject(A, path/to/A.avsc, AVRO)
- Subject(B, path/to/B.avsc, AVRO)
- Subject(C, path/to/C.avsc, AVRO)

This list of inputs (i.e. Tuple[A, path/to/A.avsc]) is produced from a YAML file that is the output of another script.

I had an error in this script which produces the list of subjects, such that the path/to/A.avsc was incorrect and hence could not be read. I expected this to fail the entire task (which it did) without registering any other schemas (which it did not) past the one that failed, but was surprised to cee that subjects B and C had been registered even though A had not.

To be clear, I don't expect the plugin to soft-/hard-delete/rollback any schemas that were registered as part of a "failing" task execution (i.e. A failed, B & C successful => Delete B and C) as this would probably do more harm than good in the long run.

I suppose I expected something like "A failed, skipping rest". This may or may not be enough (it definitely is for my use case), though the question arises what kind of behaviour would be desired if a subject somewhere in the "middle" would fail to be registered.

Would this be useful to anyone else? Would it be a bad idea?

filpano commented 3 months ago

To give a bit more background information on my use case:

I'm using (some) of these schemas in combination with e.g. ksqlDB. I provide schema IDs for all of my queries so that I can very finely control the evolution of the queries in my clusters.

In the above example, a successful schema registration would (given the above order) end up with the following schema IDs: {A: 1, B: 2, C: 3}. In actuality, since only B and C were successful, I end up with the following schema IDs: {B: 1, C: 2}, with A receiving ID 3 once the error is fixed.

I provide the subjects in a deterministic order (notwithstanding the bug I had during initial development :) ). Ideally, I would like this to have the effect that all my schemas have the same schema ID across all environments that I use them in. Perhaps this is too strict of a requirement?

ImFlog commented 2 months ago

Hello @filpano sorry for the delay answering, was on holiday the whole month of August ☀️ This seems like a valid use case, we could handle this using a flag, as soon as something fails and that the failFast flag is activated we could stop the current task. Would you be interested in trying to push a PR in this sense ?

My schedule is pretty tight currently so I can't give you any date for when I will get a bit of time to work on this 😞