datahub-project / datahub

The Metadata Platform for your Data Stack
https://datahubproject.io
Apache License 2.0
9.49k stars 2.82k forks source link

Custom Transformer not loaded #9749

Open DasMagischeToastbrot opened 5 months ago

DasMagischeToastbrot commented 5 months ago

Describe the bug We want to use a custom transformer, but this transformer does not work, we just receive the following error:

datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure transformers: 'Did not find a registered class for custom_transform_example_alias'

Steps to reproduce the behavior:

Custom transformer Code: Same as https://datahubproject.io/docs/metadata-ingestion/docs/transformer/dataset_transformer#writing-a-custom-transformer-from-scratch and took the name of the file: custom_transform_example.py this file gets ingested via

          command:
            [
              "/bin/sh",
              "-c",
              "datahub ingest -c /etc/recipe/recipe.yml"
            ]

We followed the instructions as you wrote in https://datahubproject.io/docs/metadata-ingestion/docs/transformer/dataset_transformer#writing-a-custom-transformer-from-scratch.

We added the transformers as follows to the recipe:


    transformers:
      - type: custom_transform_example_alias
        config:
          owners_json: /etc/recipe/owner-musement.json

Expected behavior

As you wrote "Now that we've defined the transformer, we need to make it visible to DataHub. The easiest way to do this is to just place it in the same directory as your recipe, in which case the module name is the same as the file – in this case, custom_transform_example." in https://datahubproject.io/docs/metadata-ingestion/docs/transformer/dataset_transformer#writing-a-custom-transformer-from-scratch I would expect that it would find the class.

DasMagischeToastbrot commented 4 months ago

Is there any progress?

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

DasMagischeToastbrot commented 3 months ago

are there any updates?

hsheth2 commented 3 months ago

@DasMagischeToastbrot what was the full stack trace that you saw?

DasMagischeToastbrot commented 3 months ago

powerbi-test [2024-04-23 05:43:35,827] INFO {datahub.cli.ingest_cli:147} - DataHub CLI version: 1!0.13.0+docker powerbi-test [2024-04-23 05:43:36,008] INFO {datahub.ingestion.run.pipeline:238} - Sink configured successfully. DataHubRestEmitter: configured to talk to https://example.sit.blob/api/gms with token: eyJh**JijU powerbi-test /datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/source/tableau.py:2515: ConfigurationWarning: env is deprecated and will be removed in a future release. Please use platform_instance instead. powerbi-test config = TableauConfig.parse_obj(config_dict) linkerd-init time="2024-04-23T05:43:25Z" level=info msg="/sbin/iptables-save -t nat" linkerd-init time="2024-04-23T05:43:25Z" level=info msg="# Generated by iptables-save v1.8.9 on Tue Apr 23 05:43:25 2024\nnat\n:PREROUTING ACCEPT [0:0]\n:INPUT ACCEPT [0:0]\n:OUTPUT ACCEPT [0:0]\n:POSTROUTING ACCEPT [0:0]\nCOMMIT\n# Completed on Tue Apr 23 05:43:25 2024\n" powerbi-test [2024-04-23 05:43:36,896] INFO {tableauserverclient.server.server:179} - versions: 3.19, 2.4 powerbi-test [2024-04-23 05:43:37,086] INFO {tableauserverclient.server.endpoint.auth_endpoint:50} - Signed into https://analytics.blob as user with id e54ed283-b741-41a7-b50f-29b33c476609 powerbi-test [2024-04-23 05:43:37,087] INFO {datahub.ingestion.source.tableau:729} - Authenticated to Tableau server powerbi-test [2024-04-23 05:43:37,087] INFO {datahub.ingestion.run.pipeline:255} - Source configured successfully. powerbi-test [2024-04-23 05:43:37,369] ERROR {datahub.entrypoints:201} - Command failed: Failed to configure transformers: 'Did not find a registered class for custom_transform_example_alias' powerbi-test Traceback (most recent call last): powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 121, in _add_init_error_context linkerd-init time="2024-04-23T05:43:25Z" level=info msg="/sbin/iptables -t nat -N PROXY_INIT_REDIRECT" linkerd-init time="2024-04-23T05:43:25Z" level=info msg="/sbin/iptables -t nat -A PROXY_INIT_REDIRECT -p tcp --match multiport --dports 4190,4191,4567,4568 -j RETURN -m comment --comment proxy-init/ignore-port-4190,4191,4567,4568/1713851005" linkerd-init time="2024-04-23T05:43:25Z" level=info msg="/sbin/iptables -t nat -A PROXY_INIT_REDIRECT -p tcp -j REDIRECT --to-port 4143 -m comment --comment proxy-init/redirect-all-incoming-to-proxy-port/1713851005" powerbi-test yield powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 265, in init linkerd-init time="2024-04-23T05:43:25Z" level=info msg="/sbin/iptables -t nat -A PREROUTING -j PROXY_INIT_REDIRECT -m comment --comment proxy-init/install-proxy-init-prerouting/1713851005" linkerd-init time="2024-04-23T05:43:25Z" level=info msg="/sbin/iptables -t nat -N PROXY_INIT_OUTPUT" linkerd-init time="2024-04-23T05:43:25Z" level=info msg="/sbin/iptables -t nat -A PROXY_INIT_OUTPUT -m owner --uid-owner 2102 -j RETURN -m comment --comment proxy-init/ignore-proxy-user-id/1713851005" powerbi-test self._configure_transforms() powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 272, in _configure_transforms powerbi-test transformer_class = transform_registry.get(transformer_type) powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 172, in get linkerd-init time="2024-04-23T05:43:25Z" level=info msg="/sbin/iptables -t nat -A PROXY_INIT_OUTPUT -o lo -j RETURN -m comment --comment proxy-init/ignore-loopback/1713851005" linkerd-init time="2024-04-23T05:43:25Z" level=info msg="/sbin/iptables -t nat -A PROXY_INIT_OUTPUT -p tcp --match multiport --dports 4567,4568 -j RETURN -m comment --comment proxy-init/ignore-port-4567,4568/1713851005" powerbi-test raise KeyError(f"Did not find a registered class for {key}") powerbi-test KeyError: 'Did not find a registered class for custom_transform_example_alias' linkerd-init time="2024-04-23T05:43:25Z" level=info msg="/sbin/iptables -t nat -A PROXY_INIT_OUTPUT -p tcp -j REDIRECT --to-port 4140 -m comment --comment proxy-init/redirect-all-outgoing-to-proxy-port/1713851005" powerbi-test powerbi-test The above exception was the direct cause of the following exception: powerbi-test powerbi-test Traceback (most recent call last): powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/entrypoints.py", line 188, in main linkerd-init time="2024-04-23T05:43:25Z" level=info msg="/sbin/iptables -t nat -A OUTPUT -j PROXY_INIT_OUTPUT -m comment --comment proxy-init/install-proxy-init-output/1713851005" linkerd-init time="2024-04-23T05:43:25Z" level=info msg="/sbin/iptables-save -t nat" linkerd-init time="2024-04-23T05:43:25Z" level=info msg="# Generated by iptables-save v1.8.9 on Tue Apr 23 05:43:25 2024\nnat\n:PREROUTING ACCEPT [0:0]\n:INPUT ACCEPT [0:0]\n:OUTPUT ACCEPT [0:0]\n:POSTROUTING ACCEPT [0:0]\n:PROXY_INIT_OUTPUT - [0:0]\n:PROXY_INIT_REDIRECT - [0:0]\n-A PREROUTING -m comment --comment \"proxy-init/install-proxy-init-prerouting/1713851005\" -j PROXY_INIT_REDIRECT\n-A OUTPUT -m comment --comment \"proxy-init/install-proxy-init-output/1713851005\" -j PROXY_INIT_OUTPUT\n-A PROXY_INIT_OUTPUT -m owner --uid-owner 2102 -m comment --comment \"proxy-init/ignore-proxy-user-id/1713851005\" -j RETURN\n-A PROXY_INIT_OUTPUT -o lo -m comment --comment \"proxy-init/ignore-loopback/1713851005\" -j RETURN\n-A PROXY_INIT_OUTPUT -p tcp -m multiport --dports 4567,4568 -m comment --comment \"proxy-init/ignore-port-4567,4568/1713851005\" -j RETURN\n-A PROXY_INIT_OUTPUT -p tcp -m comment --comment \"proxy-init/redirect-all-outgoing-to-proxy-port/1713851005\" -j REDIRECT --to-ports 4140\n-A PROXY_INIT_REDIRECT -p tcp -m multiport --dports 4190,4191,4567,4568 -m comment --comment \"proxy-init/ignore-port-4190,4191,4567,4568/1713851005\" -j RETURN\n-A PROXY_INIT_REDIRECT -p tcp -m comment --comment \"proxy-init/redirect-all-incoming-to-proxy-port/1713851005\" -j REDIRECT --to-ports 4143\nCOMMIT\n# Completed on Tue Apr 23 05:43:25 2024\n" powerbi-test sys.exit(datahub(standalone_mode=False, kwargs)) linkerd-proxy [ 0.002005s] INFO ThreadId(01) linkerd2_proxy: release 2.210.4 (5a910be) by linkerd on 2023-11-22T17:01:46Z linkerd-proxy [ 0.002809s] INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime linkerd-proxy [ 0.003585s] INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191 powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1157, in call linkerd-proxy [ 0.003598s] INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143 linkerd-proxy [ 0.003602s] INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140 linkerd-proxy [ 0.003607s] INFO ThreadId(01) linkerd2_proxy: Tap interface on 0.0.0.0:4190 linkerd-proxy [ 0.003610s] INFO ThreadId(01) linkerd2_proxy: Local identity is datahub-ingestion-metrics-exporter-datahub.search-and-discovery.serviceaccount.identity.linkerd.cluster.local powerbi-test return self.main(args, kwargs) powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1078, in main linkerd-proxy [ 0.003614s] INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local) linkerd-proxy [ 0.003618s] INFO ThreadId(01) linkerd2_proxy: Destinations resolved via linkerd-dst-headless.linkerd.svc.cluster.local:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local) powerbi-test rv = self.invoke(ctx) powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke powerbi-test return _process_result(sub_ctx.command.invoke(sub_ctx)) powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke powerbi-test return _process_result(sub_ctx.command.invoke(sub_ctx)) powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke linkerd-proxy [ 0.020707s] INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity id=datahub-ingestion-metrics-exporter-datahub.search-and-discovery.serviceaccount.identity.linkerd.cluster.local linkerd-proxy [ 5.790341s] INFO ThreadId(01) inbound:server{port=8000}:rescue{client.addr=100.64.42.124:37896}: linkerd_app_core::errors::respond: HTTP/1.1 request failed error=error trying to connect: Connection refused (os error 111) error.sources=[Connection refused (os error 111)] powerbi-test return ctx.invoke(self.callback, ctx.params) powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke powerbi-test return __callback(args, kwargs) powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 454, in wrapper powerbi-test raise e powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 403, in wrapper powerbi-test res = func(*args, **kwargs) powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 201, in run powerbi-test ret = loop.run_until_complete(run_ingestion_and_check_upgrade()) Stream closed EOF for search-and-discovery/datahub-powerbi-test-76474b5f49-9n2wg (linkerd-init) powerbi-test File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete powerbi-test return future.result() powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 170, in run_ingestion_and_check_upgrade powerbi-test pipeline = Pipeline.create( powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 362, in create powerbi-test return cls( powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 264, in init powerbi-test with _add_init_error_context("configure transformers"): powerbi-test File "/usr/local/lib/python3.10/contextlib.py", line 153, in exit powerbi-test self.gen.throw(typ, value, traceback) powerbi-test File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 123, in _add_init_error_context linkerd-proxy [ 8.282254s] INFO ThreadId(01) inbound:server{port=8000}:rescue{client.addr=100.64.99.86:50472}: linkerd_app_core::errors::respond: HTTP/1.1 request failed error=error trying to connect: Connection refused (os error 111) error.sources=[Connection refused (os error 111)] powerbi-test raise PipelineInitError(f"Failed to {step}: {e}") from e powerbi-test datahub.ingestion.run.pipeline.PipelineInitError: Failed to configure transformers: 'Did not find a registered class for custom_transform_example_alias' Stream closed EOF for search-and-discovery/datahub-powerbi-test-76474b5f49-9n2wg (powerbi-test)

DasMagischeToastbrot commented 2 months ago

is there any process? @hsheth2

DasMagischeToastbrot commented 2 months ago

@hsheth2 do you have time to check for it?

DasMagischeToastbrot commented 1 month ago

could you please give some feedback @hsheth2