AdrianStrugala / AvroConvert

Rapid Avro serializer for C# .NET
Other
97 stars 27 forks source link

Issues with Azure ADX #118

Open wdcossey opened 9 months ago

wdcossey commented 9 months ago

I'm having issues with Azure ADX.

I would like to write data (1000s, 100000s records) to a file that will be pushed to Azure ADX for processing.

Whilst the library is simple enough to get working, Azure does not like the schema that is given.

I simply tried serializing a list of objects (say 2-10) and wrote that to a file and uploaded it, it failed.

Writing a single object does however work.

I also tried Merge, that also failed to work.

My understanding is that ADX doesn't understand the array data type.

Is what I'm attempting to do even possible? I wouldn't want to send 10,000 individual files.

AdrianStrugala commented 9 months ago

Could you share the error returned from ADX?

wdcossey commented 9 months ago

Hi @AdrianStrugala

Here's part of the error:

Couldn't infer file schema. Error: Input (format: 'Avro') source cannot be read due to: 'Unrecognized Avro schema: '{"type":"array","items"

Give me a bit and I will send a code sample (with a full error).

wdcossey commented 9 months ago

Full error:

Couldn't infer file schema. Error: Input (format: 'Avro') source cannot be read due to: 'Unrecognized Avro schema: '.
{"type":"array","items":{"name":"SomeClass","namespace":"BogusData","type":"record","fields":[{"name":"Int32","type":"int"},{"name":"String","type":"string"},{"name":"Datetime","type":{"type":"long","logicalType":"timestamp-micros"}},{"name":"Decimal","type":{"type":"bytes","logicalType":"decimal","precision":28,"scale":18}}]}}

Test code:

var seqNum = 123;

var fakeItems = new Faker<SomeClass>()
    .CustomInstantiator(f => new SomeClass(seqNum++))
    .RuleFor(o => o.Decimal, f => f.Random.Decimal(1.1m, 999m))
    .RuleFor(o => o.Datetime, f => f.Date.Recent())
    .RuleFor(o => o.String, f => f.Random.String(4, 4)).Generate(10);

var result = AvroConvert.Serialize(fakeItems, CodecType.Null);
await File.WriteAllBytesAsync("somefilename.avro", result);

Class:

public class SomeClass
{
    public int Int32 { get; }

    public string String { get; set; }

    public DateTime Datetime { get; set; }

    [AvroDecimal(Precision = 28, Scale = 18)]
    public decimal Decimal { get; set; }

    public SomeClass(int seqNum)
    {
        Int32 = seqNum;
    }
}
AdrianStrugala commented 9 months ago

The schema is valid according to Avro specification. It looks like the problem is on ADX side. Could you report that to MS? I will take another look anyway, but I can't promise anything.

wdcossey commented 9 months ago

@AdrianStrugala I have resolved the issue, I will open a PR tomorrow [after I have done some additional testing].

You can have a look at the PR and approve it or use what is there to make a solution more to your liking.

wdcossey commented 9 months ago

119

AdrianStrugala commented 7 months ago

The proposed solution will be a default behavior for V4.