Provide Schema Naming Strategies out of the box

awslabs / aws-glue-schema-registry

AWS Glue Schema Registry Client library provides serializers / de-serializers for applications to integrate with AWS Glue Schema Registry Service. The library currently supports Avro, JSON and Protobuf data formats. See https://docs.aws.amazon.com/glue/latest/dg/schema-registry.html to get started.

Apache License 2.0

131 stars 97 forks source link

Provide Schema Naming Strategies out of the box #199

Open miguellgramacho96 opened 2 years ago

miguellgramacho96 commented 2 years ago

Hi,

Does the library offers built in Schema Naming Strategies? If not, is there any awareness of this issue, maybe plans to add it in the future?

blacktooth commented 2 years ago

We provide a default naming strategy.

Is there a specific strategy you are looking for?

miguellgramacho96 commented 2 years ago

If I am not mistaken, the default naming strategy is topic name based.

I ended up implementing a custom strategy, record name based. I would argue that, aside from the default naming strategy that is provided, it would be helpful to provide record name based and topic + record name based strategies.

blacktooth commented 2 years ago

By record, do you mean Avro record name based? Please feel free to send a PR. If it's generic and usable, we can include it in the code.

OneCricketeer commented 2 years ago

You can explicitly set schemaName property (assuming you dont have multiple schemas per serializer)

Related #93

davido912 commented 1 year ago

You can explicitly set schemaName property (assuming you dont have multiple schemas per serializer)

Related #93

Hey @OneCricketeer, just wondering, if one sets to consume from multiple topics, how would one go around assigning a schema for each of those topics, is this a thing?

OneCricketeer commented 1 year ago

@davido912 I don't have experience with this specific converter, but my recommendation would be to duplicate and deploy a new connector for each unique topic name.

There's no significant overhead of doing this, as it'll just create another JVM task, as if increasing tasks.max of a single connector reading multiple topics

davido912 commented 1 year ago

@OneCricketeer thanks for the answer! we're on the fence here with understanding whether it's more beneficial having one connector assigned to multiple topics or several connectors with a 1:1 relationship in terms of resources, which according to you, if I'm getting right, is pretty much the same in terms of overhead?

OneCricketeer commented 1 year ago

I've not done extensive tests, but it should be similar, in theory. A 1:1 setup also has better fault tolerance in that one topic out of many wouldn't be responsible for crashing the whole connector