This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.25k
stars
4.59k
forks
source link
[BUG] Calling datafactoryResource.GetDataFactoryDatasetAsync(datasetName)) on a dataset triggers desirialization exception #45005
I am using the SDK to be able to test data flows in Azure Data Factory programmatically.
To do that, we create a data flow debug session, create a data flow and all associated resources (linked services, datasets) and use _datafactoryResource.AddDataFlowToDebugSessionAsync(dataFactoryDataFlowDebugPackageContent); to add the dataflow to the debug session created for this purpose.
However when trying to loop through existing datasets (source and sink datasets) and add them to a list of DataFactoryDatasetDebugInfo, I am getting exceptions when calling:
var datasetData = (await _datafactoryResource.GetDataFactoryDatasetAsync(source.Dataset.ReferenceName)).Value.Data; on all datasets that do not have a schema defined in Azure Data Factory.
As far as I know, defining a schema for datasets is not mandatory in Data Factory. At the moment there are no workarounds using the SDK.
I suppose the issue is the schema definition as I created a simple test dataflow with just a source and sink, and as soon as I have added a schema to my sink which previously wasn't there, all calls to _datafactoryResource.GetDataFactoryDatasetAsync() succeeded.
Expected behavior
var datasetData = (await _datafactoryResource.GetDataFactoryDatasetAsync(source.Dataset.ReferenceName)).Value.Data; must not fail with exception "Cannot deserialize an Object as a list." when dataset does not have a schema defined.
Actual behavior
var datasetData = (await _datafactoryResource.GetDataFactoryDatasetAsync(source.Dataset.ReferenceName)).Value.Data; fails with exception "Cannot deserialize an Object as a list." when dataset does not have a schema defined.
Message:
System.InvalidOperationException: Cannot deserialize an Object as a list.
Create a Data flow in DataFactory, with a source and a sink.
Define a schema on the source dataset, and do not define a schema in the sink dataset (note that the other way around works as well).
Call var datasetData = (await _datafactoryResource.GetDataFactoryDatasetAsync(source.Dataset.ReferenceName)).Value.Data; programmatically will fail with exception "Cannot deserialize an Object as a list.".
Note that when creating a dataset and getting it programmatically, the issue does not occur. The following is executed successfully:
where _client is a DataFactoryResource
var linkedServiceName = "TestLinkedService_1";
var azureStorageLinkedService = new AzureStorageLinkedService
{
ConnectionString = StorageConnectionString
};
var linkedServiceData = new DataFactoryLinkedServiceData(azureStorageLinkedService);
await _client.GetDataFactoryLinkedServices().CreateOrUpdateAsync(Azure.WaitUntil.Completed, linkedServiceName, linkedServiceData);
var jsonDataset = new JsonDataset(
new DataFactoryLinkedServiceReference(DataFactoryLinkedServiceReferenceKind.LinkedServiceReference, linkedServiceName)
);
var jsonDatasetData = new DataFactoryDatasetData(jsonDataset);
var jsonDataFactoryDatasetResource = (await _client.GetDataFactoryDatasets().CreateOrUpdateAsync(Azure.WaitUntil.Completed, "TestJsonDataset", jsonDatasetData)).Value;
var jsonDatasetResponse = (await _client.GetDataFactoryDatasetAsync("TestJsonDataset")).Value.Data;
Environment
.NET SDK:
Version: 8.0.303
Runtime Environment:
OS Name: Windows
OS Version: 10.0.22631
OS Platform: Windows
.NET 6 and .NET 8
IDE and version: Visual Studio 17.10.4
Library name and version
Azure.ResourceManager.DataFactory 1.1.0
Describe the bug
I am using the SDK to be able to test data flows in Azure Data Factory programmatically. To do that, we create a data flow debug session, create a data flow and all associated resources (linked services, datasets) and use
_datafactoryResource.AddDataFlowToDebugSessionAsync(dataFactoryDataFlowDebugPackageContent);
to add the dataflow to the debug session created for this purpose.However when trying to loop through existing datasets (source and sink datasets) and add them to a list of
DataFactoryDatasetDebugInfo
, I am getting exceptions when calling:var datasetData = (await _datafactoryResource.GetDataFactoryDatasetAsync(source.Dataset.ReferenceName)).Value.Data;
on all datasets that do not have a schema defined in Azure Data Factory.As far as I know, defining a schema for datasets is not mandatory in Data Factory. At the moment there are no workarounds using the SDK.
I suppose the issue is the schema definition as I created a simple test dataflow with just a source and sink, and as soon as I have added a schema to my sink which previously wasn't there, all calls to
_datafactoryResource.GetDataFactoryDatasetAsync()
succeeded.Expected behavior
var datasetData = (await _datafactoryResource.GetDataFactoryDatasetAsync(source.Dataset.ReferenceName)).Value.Data;
must not fail with exception "Cannot deserialize an Object as a list." when dataset does not have a schema defined.Actual behavior
var datasetData = (await _datafactoryResource.GetDataFactoryDatasetAsync(source.Dataset.ReferenceName)).Value.Data;
fails with exception "Cannot deserialize an Object as a list." when dataset does not have a schema defined.Message: System.InvalidOperationException: Cannot deserialize an Object as a list.
Stack Trace: DataFactoryElementJsonConverter.DeserializeGenericList[T](JsonElement json) InvokeStub_DataFactoryElementJsonConverter.DeserializeGenericList(Object, Span`1) MethodBaseInvoker.InvokeWithOneArg(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
Reproduction Steps
var datasetData = (await _datafactoryResource.GetDataFactoryDatasetAsync(source.Dataset.ReferenceName)).Value.Data;
programmatically will fail with exception "Cannot deserialize an Object as a list.".Note that when creating a dataset and getting it programmatically, the issue does not occur. The following is executed successfully: where _client is a DataFactoryResource
Environment
.NET SDK: Version: 8.0.303
Runtime Environment: OS Name: Windows OS Version: 10.0.22631 OS Platform: Windows
.NET 6 and .NET 8 IDE and version: Visual Studio 17.10.4