AbsaOSS / ABRiS

Avro SerDe for Apache Spark structured APIs.
Apache License 2.0
229 stars 75 forks source link

usage of toAvro method #5

Closed ghaithSN closed 6 years ago

ghaithSN commented 6 years ago

i'm not sure that i understood correctly; for this signature toAvro(schemaName: String, schemaNamespace: String): Dataset[Array[Byte]] the namespace should already exist in SchemaRegistry but the schemaName is a name we give to our new schema ( in the spark application ). unfortunately, when i did that, i realized that the schema was not added also, for this signature 'toAvro(rows: Dataset[Row], schemas: SchemasProcessor)' could you give me insights how to prepare the 'schemas' parameter ( because the SchemasProcessor has only two getters )

felipemmelo commented 6 years ago

Hi there. About Schema Registry, the current API only allows you to retrieve your schema while reading your data, as show here, however, the schema registration is done "offline", like explained here.

The next version which will be available by the end of the month you'll allow you to perform schema management (register/update your schemas) from the API at write time.

About toAvro(rows: Dataset[Row], schemas: SchemasProcessor), this is a private method, not part of the API, whose job is to carry corresponding Spark and Avro schemas creation to partitions, through the .mapPartitions method.

felipemmelo commented 6 years ago

Hi there, ABRiS now has an API that allows you to append the id of your schema to the Avro payload, so that it can be consumed by Confluent tools. Also, the schema is automatically registered if not yet in Schema Registry or updated otherwise.

You will find an example here.

ghaithSN commented 6 years ago

thank you @felipemmelo

felipemmelo commented 6 years ago

Seems to be solve thus closing.