GoogleCloudPlatform / datacatalog-connectors-hive

Sample code with integration between Data Catalog and Hive data source.
Apache License 2.0
25 stars 14 forks source link

[BUG] Error occurred when run python entry point #18

Closed Wonong closed 3 years ago

Wonong commented 3 years ago

What happened: I got error messages like below when following guide document.(https://github.com/GoogleCloudPlatform/datacatalog-connectors-hive/tree/master/google-datacatalog-hive-connector#31-run-python-entry-point)

google-datacatalog-hive-connector \
--datacatalog-project-id=$HIVE2DC_DATACATALOG_PROJECT_ID \
--datacatalog-location-id=$HIVE2DC_DATACATALOG_LOCATION_ID \
--hive-metastore-db-host=$HIVE2DC_HIVE_METASTORE_DB_HOST \
--hive-metastore-db-user=$HIVE2DC_HIVE_METASTORE_DB_USER \
--hive-metastore-db-pass=$HIVE2DC_HIVE_METASTORE_DB_PASS \
--hive-metastore-db-name=$HIVE2DC_HIVE_METASTORE_DB_NAME \
--hive-metastore-db-type=$HIVE2DC_HIVE_METASTORE_DB_TYPE

INFO:root:
==============Start hive-to-datacatalog============
INFO:root:

==============Scrape metadata===============
INFO:root:
--> SyncEvent.MANUAL_DATABASE_SYNC
INFO:root:
1 databases ready to be ingested...
INFO:root:

==============Prepare metadata===============
INFO:root:
Preparing the metadata...
INFO:root:
--> Database: default
INFO:root:
2 tables ready to be ingested...
INFO:root:
==============Ingest metadata===============
DEBUG:google.auth._default:Checking /home/wonyeong/workspaces/data-catalog-credentials.json for explicit credentials as part of auth process...
INFO:root:
INFO:root:Starting to clean up the catalog...
DEBUG:google.auth.transport.requests:Making request: POST https://oauth2.googleapis.com/token
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): oauth2.googleapis.com:443
DEBUG:urllib3.connectionpool:https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
INFO:root:0 entries that match the search query exist in Data Catalog!
INFO:root:Looking for entries to be deleted...
INFO:root:0 entries will be deleted.
INFO:root:
Starting to ingest custom metadata...
DEBUG:google.auth._default:Checking /home/wonyeong/workspaces/data-catalog-credentials.json for explicit credentials as part of auth process...
INFO:root:
INFO:root:Starting the ingestion flow...
DEBUG:google.auth.transport.requests:Making request: POST https://oauth2.googleapis.com/token
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): oauth2.googleapis.com:443
DEBUG:urllib3.connectionpool:https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
Traceback (most recent call last):
  File "/home/wonyeong/workspaces/datacatalog-connectors-hive/venv/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/home/wonyeong/workspaces/datacatalog-connectors-hive/venv/lib/python3.7/site-packages/grpc/_channel.py", line 923, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/wonyeong/workspaces/datacatalog-connectors-hive/venv/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.UNIMPLEMENTED
    details = "Received http2 header with status: 404"
    debug_error_string = "{"created":"@1605598060.099067455","description":"Received http2 :status header with non-200 OK status","file":"src/core/ext/filters/http/client/http_client_filter.cc","file_line":129,"grpc_message":"Received http2 header with status: 404","grpc_status":12,"value":"404"}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/wonyeong/workspaces/datacatalog-connectors-hive/venv/bin/google-datacatalog-hive-connector", line 8, in <module>
    sys.exit(main())
  File "/home/wonyeong/workspaces/datacatalog-connectors-hive/venv/lib/python3.7/site-packages/google/datacatalog_connectors/hive/datacatalog_cli.py", line 87, in main
    Hive2DatacatalogCli().run(argv[1:] if len(argv) > 0 else argv)
  File "/home/wonyeong/workspaces/datacatalog-connectors-hive/venv/lib/python3.7/site-packages/google/datacatalog_connectors/hive/datacatalog_cli.py", line 46, in run
    enable_monitoring=args.enable_monitoring).run()
  File "/home/wonyeong/workspaces/datacatalog-connectors-hive/venv/lib/python3.7/site-packages/google/datacatalog_connectors/hive/sync/datacatalog_synchronizer.py", line 126, in run
    self.__ingest_created_or_updated(prepared_entries)
  File "/home/wonyeong/workspaces/datacatalog-connectors-hive/venv/lib/python3.7/site-packages/google/datacatalog_connectors/hive/sync/datacatalog_synchronizer.py", line 141, in __ingest_created_or_updated
    ingestor.ingest_metadata([database_entry, *table_related_entries])
  File "/home/wonyeong/workspaces/datacatalog-connectors-hive/venv/lib/python3.7/site-packages/google/datacatalog_connectors/commons/ingest/datacatalog_metadata_ingestor.py", line 56, in ingest_metadata
    entry_group_id=self.__entry_group_id)
  File "/home/wonyeong/workspaces/datacatalog-connectors-hive/venv/lib/python3.7/site-packages/google/datacatalog_connectors/commons/datacatalog_facade.py", line 181, in create_entry_group
    entry_group=datacatalog.EntryGroup())
  File "/home/wonyeong/workspaces/datacatalog-connectors-hive/venv/lib/python3.7/site-packages/google/cloud/datacatalog_v1beta1/services/data_catalog/client.py", line 539, in create_entry_group
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
  File "/home/wonyeong/workspaces/datacatalog-connectors-hive/venv/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
    return wrapped_func(*args, **kwargs)
  File "/home/wonyeong/workspaces/datacatalog-connectors-hive/venv/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.MethodNotImplemented: 501 Received http2 header with status: 404

What you expected to happen: I expect my local hive metastore data is uploaded to data catalog

How to reproduce it (as minimally and precisely as possible):

  1. Create GCP project and service account with needed permissions.
  2. Deploy hive(I use this https://github.com/big-data-europe/docker-hive)
  3. Run as the guide document described

Anything else we need to know?:

mesmacosta commented 3 years ago

Hi Wonong,

Thanks for opening this.

What values are you passing on:

$HIVE2DC_DATACATALOG_LOCATION_ID
$HIVE2DC_HIVE_METASTORE_DB_TYPE

I have seem this error message when you pass an invalid GCP location id.

mesmacosta commented 3 years ago

@Wonong

Wonong commented 3 years ago

Hi Wonong,

Thanks for opening this.

What values are you passing on:

$HIVE2DC_DATACATALOG_LOCATION_ID
$HIVE2DC_HIVE_METASTORE_DB_TYPE

I have seem this error message when you pass an invalid GCP location id.

I found typo in my location id. Thank you for fast answer. :)