AdrianStrugala / AvroConvert

Rapid Avro serializer for C# .NET
Other
102 stars 27 forks source link

Deserializing Avro from Google Big Query ( and Logical Type: Datetime) #69

Open jpsgamboa opened 2 years ago

jpsgamboa commented 2 years ago

I'm having trouble deserializing data stored in BigQuery.

For context, BigQuery returns two elements:

I'm trying the following method:

var result = AvroConvert.DeserializeHeadless<MyEvent>(item.AvroRows.SerializedBinaryRows.ToByteArray(), readSession.AvroSchema.Schema);

But this only returns one instance of MyEvent. How should I parse a payload that should contain many roads?

Additionally, BigQuery returns datetime fields as Logical Type: datetime that AvroConvert states: System.Runtime.Serialization.SerializationException: 'Unknown LogicalType schema :'datetime'.'

Is there a way to extend AvroConvert to accept this logical type? I'm not too familiar with Avro and unsure if this is too specific to Google's specification, or something that could be included in AvroConvert.

I'm not sure if there is an issue here at all, or just my misunderstanding of Avro and the ecosystem, and I opened an issue on Google's repo as well, but I thought this could be relevant here as well!

AdrianStrugala commented 2 years ago

Hello,

1) Collection of MyEvents can be deserialized by using:

var result = AvroConvert.DeserializeHeadless<List<MyEvent>>(item.AvroRows.SerializedBinaryRows.ToByteArray(), readSession.AvroSchema.Schema)

2) I will take a look at the datetime logical type. Could you attach a sample file and schema? It will help me a lot with debugging.

Thanks, Adrian

jpsgamboa commented 2 years ago

Hey Adrian,

Please find below a zip with two files:

AvroSample.zip

I also exported an Avro file directly from the Google Cloud console, which may also help: bq-sample.zip

Thank you!

AdrianStrugala commented 2 years ago

Hello,

I've spotted several issues with your files. First of all, there is no "datetime" logical type according to Avro documentation; Datetime type is often represented as timestamp-millis logical type or simply string. But I wasn't able to deserialize your AvroRows anyway - there are more issues in the AvroModel.

But don't worry - bq-sample work perfectly fine. Short tutorial for you, how to deal with Avro data:

1) Get schema of the data

var avroBytes = File.ReadAllBytes("bq-sample");
var schema = AvroConvert.GetSchema(avroBytes);

That gives you schema in Json format

2) Generate C# model: You can use https://avroconvertonline.azurewebsites.net/ for convenience. image

3) Deserialize the data

var avroBytes = File.ReadAllBytes("bq-sample");
var result = AvroConvert.Deserialize<List<Root>>(avroBytes );

Produces a list of 1000 Root items.

Hope it helps, Adrian