Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.25k stars 4.59k forks source link

[BUG] Calling datafactoryResource.GetDataFactoryDatasetAsync(datasetName)) on a dataset triggers desirialization exception #45005

Open ClementVaillantCodit opened 2 months ago

ClementVaillantCodit commented 2 months ago

Library name and version

Azure.ResourceManager.DataFactory 1.1.0

Describe the bug

I am using the SDK to be able to test data flows in Azure Data Factory programmatically. To do that, we create a data flow debug session, create a data flow and all associated resources (linked services, datasets) and use _datafactoryResource.AddDataFlowToDebugSessionAsync(dataFactoryDataFlowDebugPackageContent); to add the dataflow to the debug session created for this purpose.

However when trying to loop through existing datasets (source and sink datasets) and add them to a list of DataFactoryDatasetDebugInfo, I am getting exceptions when calling: var datasetData = (await _datafactoryResource.GetDataFactoryDatasetAsync(source.Dataset.ReferenceName)).Value.Data; on all datasets that do not have a schema defined in Azure Data Factory.

As far as I know, defining a schema for datasets is not mandatory in Data Factory. At the moment there are no workarounds using the SDK.

I suppose the issue is the schema definition as I created a simple test dataflow with just a source and sink, and as soon as I have added a schema to my sink which previously wasn't there, all calls to _datafactoryResource.GetDataFactoryDatasetAsync() succeeded.

Expected behavior

var datasetData = (await _datafactoryResource.GetDataFactoryDatasetAsync(source.Dataset.ReferenceName)).Value.Data; must not fail with exception "Cannot deserialize an Object as a list." when dataset does not have a schema defined.

Actual behavior

var datasetData = (await _datafactoryResource.GetDataFactoryDatasetAsync(source.Dataset.ReferenceName)).Value.Data; fails with exception "Cannot deserialize an Object as a list." when dataset does not have a schema defined.

Message: System.InvalidOperationException: Cannot deserialize an Object as a list.

Stack Trace:  DataFactoryElementJsonConverter.DeserializeGenericList[T](JsonElement json) InvokeStub_DataFactoryElementJsonConverter.DeserializeGenericList(Object, Span`1) MethodBaseInvoker.InvokeWithOneArg(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)

Reproduction Steps

  1. Create a Data flow in DataFactory, with a source and a sink.
  2. Define a schema on the source dataset, and do not define a schema in the sink dataset (note that the other way around works as well).
  3. Call var datasetData = (await _datafactoryResource.GetDataFactoryDatasetAsync(source.Dataset.ReferenceName)).Value.Data; programmatically will fail with exception "Cannot deserialize an Object as a list.".

Note that when creating a dataset and getting it programmatically, the issue does not occur. The following is executed successfully: where _client is a DataFactoryResource

var linkedServiceName = "TestLinkedService_1";

var azureStorageLinkedService = new AzureStorageLinkedService
{
    ConnectionString = StorageConnectionString
};
var linkedServiceData = new DataFactoryLinkedServiceData(azureStorageLinkedService);

await _client.GetDataFactoryLinkedServices().CreateOrUpdateAsync(Azure.WaitUntil.Completed, linkedServiceName, linkedServiceData);

var jsonDataset = new JsonDataset(
    new DataFactoryLinkedServiceReference(DataFactoryLinkedServiceReferenceKind.LinkedServiceReference, linkedServiceName)
);
var jsonDatasetData = new DataFactoryDatasetData(jsonDataset);
var jsonDataFactoryDatasetResource = (await _client.GetDataFactoryDatasets().CreateOrUpdateAsync(Azure.WaitUntil.Completed, "TestJsonDataset", jsonDatasetData)).Value;

var jsonDatasetResponse = (await _client.GetDataFactoryDatasetAsync("TestJsonDataset")).Value.Data;

Environment

.NET SDK: Version: 8.0.303

Runtime Environment: OS Name: Windows OS Version: 10.0.22631 OS Platform: Windows

.NET 6 and .NET 8 IDE and version: Visual Studio 17.10.4

github-actions[bot] commented 2 months ago

Thank you for your feedback. Tagging and routing to the team member best able to assist.

stijnmoreels commented 1 month ago

+1