Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.53k stars 2.76k forks source link

DataFactory: Problem using DatabricksSparkPython activity in pipeline #8596

Closed maximauro closed 4 years ago

maximauro commented 4 years ago

Hello,

When I try to create a new pipeline in DataFactory with a DatabricksSparkPython activity, I get the following error:

Subtype value DatabricksSparkPython has no mapping, use base class Activity. Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/deployer/main.py", line 5, in DataDeployer().deploy_all() File "/usr/local/deployer/data_deployer.py", line 37, in deploy_all adf.deploy_all_to_adf() File "/usr/local/deployer/services/adf_service.py", line 65, in deploy_all_to_adf self.deploy_all_pipelines(adf_client) File "/usr/local/deployer/services/adf_service.py", line 120, in deploy_all_pipelines self.deploy_pipeline(adf_client, entry.path) File "/usr/local/deployer/services/adf_service.py", line 134, in deploy_pipeline p = adf_client.pipelines.create_or_update(rg_name, df_name, p_name, pipeline_definition) File "/usr/lib/python3.7/site-packages/azure/mgmt/datafactory/operations/pipelines_operations.py", line 163, in create_or_update body_content = self._serialize.body(pipeline, 'PipelineResource') File "/usr/lib/python3.7/site-packages/msrest/serialization.py", line 578, in body raise errors[0] File "/usr/lib/python3.7/site-packages/msrest/serialization.py", line 220, in validate Serializer.validate(value, debug_name, **self._validation.get(attr_name, {})) File "/usr/lib/python3.7/site-packages/msrest/serialization.py", line 661, in validate raise ValidationError("required", name, True) msrest.exceptions.ValidationError: Parameter 'Activity.type' can not be None.

Here is my pipeline configuration:

{ "name": "Pipeline Test", "type": "Microsoft.DataFactory/factories/pipelines", "properties": { "activities": [ { "name": "Python Hello Test", "description": "test", "type": "DatabricksSparkPython", "typeProperties": { "pythonFile": "dbfs:/jobs/test/hello.py" }, "linkedServiceName": { "referenceName": "lsName", "type": "LinkedServiceReference" } } ] } }

Could someone help me with a solution?

Thank you!

kaerm commented 4 years ago

@maximemauro thanks for reporting this, tagging the right team to have a look at this

maximauro commented 4 years ago

Here is my Pipfile:

[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]
auto-changelog = "*"
coverage = "*"
flake8 = "*"
pylint = "*"
pytest = "*"
pytest-cov = "*"
python-githooks = "*"
python-semantic-release = "*"
flask = "*"

[packages]
azure-common = "*"
azure-mgmt = "*"
msrest = "*"
requests = "*"
cffi = "*"
envsubst = "*"

[requires]
python_version = "3.7"

Latest azure-mgmt version (v 4.0.0) includes azure-mgmt-datafactory v0.6.0, which is not the latest. Latest version of azure-mgmt-datafactory is v0.8.0, but releases notes do not mention that DatabricksSparkPython activity has been implemented...

maximauro commented 4 years ago

Upgrading the azure-mgmt-datafactory to v0.8.0 solved the problem. But I had to get rid of the azure-mgmt package. Closing the issue ;)

ghost commented 4 years ago

Thanks for working with Microsoft on GitHub! Tell us how you feel about your experience using the reactions on this comment.