The current implementation of the database scanner in Dataherald does not support foreign tables in PostgreSQL databases. This limitation restricts the tool's utility in environments where databases extensively use foreign tables for cross-database queries and data integration.
Proposed Solution
I propose enhancing the get_all_tables_and_views method to include foreign tables when scanning PostgreSQL databases. This change involves checking if the database engine is PostgreSQL (psycopg2) and, if so, appending the list of foreign tables to the lists of tables and views. Additionally, the process for generating table examples and processing foreign table columns should be adjusted to handle foreign tables appropriately, by returning an empty example for foreign tables and ensuring that foreign table columns are not processed.
Adjusting table example generation for foreign tables:
# If the engine is PostgreSQL and the table is a foreign table, return an empty list.
if db_engine.engine.driver == "psycopg2" and <foreign_table_condition>:
return []
Ensuring foreign table columns are not processed:
if db_engine.engine.driver == "psycopg2" and <foreign_table_condition>:
# Process for skipping or handling foreign table columns
Including foreign tables in the scan method:
if db_engine.engine.driver == "psycopg2":
tables += inspector.get_foreign_table_names()
Initial Testing
I've conducted brief testing of these proposed changes, which suggests they can effectively incorporate foreign table support into Dataherald's PostgreSQL database scanning capabilities. However, I believe there may be more efficient or robust methods to achieve this, and further testing and refinement are necessary.
I have not submitted a pull request at this time, as I'm looking for feedback on the proposed solution and any additional insights that could improve it.
Request for Feedback
I welcome feedback on the proposed solution, including any potential issues or alternative approaches that could enhance support for foreign tables in PostgreSQL databases within Dataherald. If anyone has experience with similar implementations or suggestions for refining this proposal, your insights would be highly appreciated.
Hi @toliver38 thanks for your collaboration.
Actually it looks fine, feel free to create a PR so we can check the branch with the changes, and if we find any issue we could fix it later.
Issue Description
The current implementation of the database scanner in Dataherald does not support foreign tables in PostgreSQL databases. This limitation restricts the tool's utility in environments where databases extensively use foreign tables for cross-database queries and data integration.
Proposed Solution
I propose enhancing the
get_all_tables_and_views
method to include foreign tables when scanning PostgreSQL databases. This change involves checking if the database engine is PostgreSQL (psycopg2
) and, if so, appending the list of foreign tables to the lists of tables and views. Additionally, the process for generating table examples and processing foreign table columns should be adjusted to handle foreign tables appropriately, by returning an empty example for foreign tables and ensuring that foreign table columns are not processed.Here is a sketch of the proposed changes:
Enhancing
get_all_tables_and_views
:Adjusting table example generation for foreign tables:
Ensuring foreign table columns are not processed:
Including foreign tables in the scan method:
Initial Testing
I've conducted brief testing of these proposed changes, which suggests they can effectively incorporate foreign table support into Dataherald's PostgreSQL database scanning capabilities. However, I believe there may be more efficient or robust methods to achieve this, and further testing and refinement are necessary.
I have not submitted a pull request at this time, as I'm looking for feedback on the proposed solution and any additional insights that could improve it.
Request for Feedback
I welcome feedback on the proposed solution, including any potential issues or alternative approaches that could enhance support for foreign tables in PostgreSQL databases within Dataherald. If anyone has experience with similar implementations or suggestions for refining this proposal, your insights would be highly appreciated.