GoogleCloudPlatform / datacatalog-connectors-hive

Sample code with integration between Data Catalog and Hive data source.
Apache License 2.0
25 stars 14 forks source link

[BUG] Timeout when tables > 15k in metastore #21

Closed vishinde closed 3 years ago

vishinde commented 3 years ago

What happened: Timeout error on running the hive connector when tables: 15k in metastore. Error message below: File "/Users/r0s03tk/Documents/walmart/keys/threeseven/lib/python3.7/site-packages/sqlalchemy/orm/strategies.py", line 720, in _load_for_state % (orm_util.state_str(state), self.key) sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <Database at 0x1038f7250> is not bound to a Session; lazy load operation of attribute 'tables' cannot proceed (Background on this error at: http://sqlalche.me/e/13/bhk3)

What you expected to happen: Sync all tables regardless of number of tables in metastore

How to reproduce it (as minimally and precisely as possible): By running the connector with >15k tables

mesmacosta commented 3 years ago

Thanks, for opening this. We will work on the fix.