SneaksAndData / arcane-framework

Akka.NET-based framework for data streaming services using the Arcane Kubernetes Operator
Apache License 2.0
5 stars 2 forks source link

[BUG] CdmChangeFeedSource attempts to stream data from multiple locations #98

Open george-zubrienko opened 3 months ago

george-zubrienko commented 3 months ago

Description

It seems sometimes in Dynamics data lake (Synapse) export the export location might change. However, due to the fact that we search for table using this code:

var tableBlobs = this.source.blobStorage.ListBlobsAsEnumerable(this.tablesPath).Where(blob =>
    blob.Name.Split("/")[^1].StartsWith($"{this.source.entityName.ToUpper()}_") &&
    blob.Name.EndsWith(".csv")).ToList();

and that filenames are not changed, we can hit two different paths in this case - old one with schemaA and new one with schemaB. Thus, stream will run normally and the abort with data type mismatch error when all rows from schemaA are exhausted

Steps to reproduce the issue

  1. Create two different paths leading to the table with the same name, but different schemas
  2. Run the source
  3. Observe failure after rows from first path are exhausted

Describe the results you expected

Either a hard failure with an ERROR level message, or a warning and automatic selection of files from a newer path (preferred)

System information

No response