datahub-project / datahub

The Metadata Platform for your Data and AI Stack
https://datahubproject.io
Apache License 2.0
9.93k stars 2.94k forks source link

Unable to ingest Json schema #11509

Open lekhamaru opened 1 month ago

lekhamaru commented 1 month ago

While evaluating DataHub, I'm trying to ingest Json schemas using CLI. My recipe file is as follows

pipeline_name: json_schema_ingestion source: type: json-schema config: path: https://json.schemastore.org/petstore-v1.0.json # e.g. https://json.schemastore.org/petstore-v1.0.json platform: SchemaRegistry # e.g. schemaregistry

platform_instance:

stateful_ingestion:
  enabled: true # recommended to have this turned on

Error log while executing the command datahub ingest -c I'm getting the error as placed in the log file log.txt

jjoyce0510 commented 1 month ago

Hi there! Do you mind trying the same with Python version 3.10. I can tell from the logs your version of Python is 3.12.4, which is not officially supported yet by the ingestion framework.

Let us know how it goes!

Cheers John

lekhamaru commented 1 month ago

Sure. Will check and revert.

lekhamaru commented 1 month ago

I checked with python version 3.8.10 but still facing the same issue. logwithpythonversion3.8.10.txt

treff7es commented 1 month ago

@lekhamaru I'm unable to reproduce this issue locally. Can you please run a pip freeze and paste the package versions here?

Here is how I tested using Python 3.9.10:

  1. pip install "acryl-datahub[json-schema]"

  2. Creating this recipe:

    pipeline_name: json_schema_ingestion
    source:
    type: json-schema
    config:
    path: "http://json.schemastore.org/petstore-v1.0.json" # e.g. json.schemastore.org/petstore-v1.0.json
    platform: SchemaRegistry # e.g. schemaregistry
    # platform_instance:
    stateful_ingestion:
      enabled: true # recommended to have this turned on
  3. Running: datahub ingest -c json_schema_ingest.dhub.yaml

treff7es commented 1 month ago

I also tested with python 3.8.10 and it worked for me.

lekhamaru commented 1 month ago

acryl-datahub==0.14.1 aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 async-timeout==4.0.3 attrs==19.3.0 autocommand==2.2.2 Automat==0.8.0 avro==1.11.3 avro-gen3==0.7.16 backports.tarfile==1.2.0 blinker==1.4 Brlapi==0.7.0 cached-property==1.5.2 certifi==2019.11.28 chardet==3.0.4 charset-normalizer==3.3.2 chrome-gnome-shell==0.0.0 click==8.1.7 click-default-group==1.2.4 click-spinner==0.1.10 cloud-init==24.3.1 colorama==0.4.3 command-not-found==0.3 configobj==5.0.6 constantly==15.1.0 cryptography==2.8 cupshelpers==1.0 dbus-python==1.2.16 defer==1.0.6 Deprecated==1.2.14 distro==1.4.0 distro-info==0.23+ubuntu1.1 docker==7.1.0 entrypoints==0.3 expandvars==0.12.0 frozenlist==1.4.1 httplib2==0.14.0 humanfriendly==10.0 hyperlink==19.0.0 idna==2.8 ijson==3.3.0 importlib-metadata==8.2.0 importlib-resources==6.4.0 incremental==16.10.1 inflect==7.3.1 jaraco.collections==5.1.0 jaraco.context==5.3.0 jaraco.functools==4.0.1 jaraco.text==3.12.1 Jinja2==2.10.1 jsonpatch==1.22 jsonpointer==2.0 jsonref==1.1.0 jsonschema==3.2.0 keyring==18.0.1 language-selector==0.1 launchpadlib==1.10.13 lazr.restfulclient==0.14.2 lazr.uri==1.0.3 louis==3.12.0 macaroonbakery==1.3.1 Mako==1.1.0 MarkupSafe==1.1.0 mixpanel==4.10.1 more-itertools==10.3.0 multidict==6.0.5 mypy-extensions==1.0.0 netifaces==0.10.4 oauthlib==3.1.0 packaging==24.1 pexpect==4.6.0 platformdirs==4.2.2 progressbar2==4.4.2 protobuf==3.6.1 psutil==6.0.0 pyasn1==0.4.2 pyasn1-modules==0.2.1 pycairo==1.16.2 pycups==1.9.73 pydantic==2.8.2 pydantic-core==2.20.1 PyGObject==3.36.0 PyHamcrest==1.9.0 PyJWT==1.7.1 pymacaroons==0.13.0 PyNaCl==1.3.0 pyOpenSSL==19.0.0 pyRFC3339==1.1 pyrsistent==0.15.5 pyserial==3.4 python-apt==2.0.1+ubuntu0.20.4.1 python-dateutil==2.9.0.post0 python-debian==0.1.36+ubuntu1.1 python-utils==3.8.2 pytz==2019.3 pyxdg==0.26 PyYAML==5.3.1 requests==2.32.3 requests-file==2.1.0 requests-unixsocket==0.2.0 ruamel.yaml==0.18.6 ruamel.yaml.clib==0.2.8 SecretStorage==2.3.1 sentry-sdk==2.11.0 service-identity==18.1.0 simplejson==3.16.0 six==1.14.0 sos==4.5.6 ssh-import-id==5.10 systemd-python==234 tabulate==0.9.0 termcolor==2.4.0 toml==0.10.2 tomli==2.0.1 Twisted==18.9.0 typeguard==4.3.0 typing-extensions==4.12.2 typing-inspect==0.9.0 ubuntu-drivers-common==0.0.0 ubuntu-pro-client==8001 ufw==0.36 unattended-upgrades==0.1 urllib3==2.2.2 wadllib==1.3.3 wrapt==1.16.0 xkit==0.0.0 yarl==1.9.4 zipp==3.19.2 zope.interface==4.7.1

lekhamaru commented 1 month ago

I'm following the same steps as you have mentioned above.

lekhamaru commented 1 month ago

I also tried with python version 3.9.10 but still getting the same error. Please can you help