Closed Vincent-Zeng closed 4 years ago
Maybe we can pass schemaManager as a parameter.
I see. I think best sollution would be to add oprional schemaId
in SchemaProvider and populat that field in createSchemaProvider(...)
than inside CatalystDataToAvro
we can get that id from the provider instead of manager.
schemaManager is not serializable. It's intentionaly done like that so it's initialized again in CatalystDataToAvro when it's needed. I would not send it as a parameter.
@cerveada Yeah. Your solution is great.
Hi all! Is this fixed? can we get a new release containing the fix?
Right now I'm having a minor issue, as the schema id is being resolved multiple times (not sure why). Sometimes the request to schema registry fails, leading to the following error:
Maybe a solution that mitigates this is resolving the schema only once per executor.
If this is not related to this issue, please tell me so I can provide a more detailed report. Thanks!
Hi @racevedoo , version 3.2.2 has just been published, and it contains the fixes. Cheers.
Hi @racevedoo , version 3.2.2 has just been published, and it contains the fixes. Cheers.
Great! Thanks for the quick release. Can we close this issue? :smile:
EDIT: I can't find the release on Github or maven central (https://search.maven.org/artifact/za.co.absa/abris_2.11)
It takes some time to appear there, but it may already be ready for download from command line maven.
Hi, team.
See
The
schemaProvider
provided as a parameter, whileschemaId
provided as aprivate field
. It means thatschemaProvider
initialize once for the whole execution whileschemaId
initialize for eachstructured streaming batch
.Let's see a case: Step 1. Specify the config
value.schema.id
aslatest
, and nowlatest
schema id is 10 Step 2.schemaProvider
fetch the schema whichschema id
is 10 fromSchema Registry
Step 3.batch 1
:private lazy val schemaId
is initialized with schema id10
, nowmagic number
is10
when serialize. Step 4. envolve the schema inSchema Registry
, nowlatest
schema id is 11. Step 5.batch2
:private lazy val schemaId
is initialized with schema id11
, nowmagic number
is11
when serialize, whileschemaProvider
still provide the schema whichschema id
is 10 to encode the data.So, maybe
schemaId
should initialize in driver and passed as a parameter instead init inside the expression, is that?