Dataherald / dataherald

Interact with your SQL database, Natural Language to SQL using LLMs
https://dataherald.readthedocs.io/en/latest/
Apache License 2.0
3.35k stars 234 forks source link

Enhance ColumnDescription with is_required and operators Fields for Improved SQL Query Generation #449

Closed toliver38 closed 7 months ago

toliver38 commented 7 months ago

Issue Description

In the context of utilizing Foreign Data Wrappers (FDW) in PostgreSQL, there's a unique requirement to specify certain columns as mandatory in every SQL query, as well as to restrict specific columns to a subset of SQL operators. This enhancement request aims to address these needs by proposing the addition of two new fields to the ColumnDescription class: is_required and operators.

Proposed Fields

  1. is_required: bool - This field indicates whether a specific field must always be included in an SQL statement. This is particularly relevant for ensuring compliance with certain constraints when working with FDWs in PostgreSQL.

  2. operators: list[str] - This field specifies the subset of operators that are permissible for a given column. For example, a role_id column might only accept the equality operator (=). This flexibility is crucial for accurately reflecting the capabilities and constraints of underlying data sources accessed via FDWs.

Use Case

Consider a table wrapped by an FDW where certain columns are required for query execution due to the remote data source's constraints, or where specific columns only support a limited range of operations. The proposed enhancements would allow for more granular control over query construction, ensuring that generated SQL queries are both valid and optimized for execution against such data sources.

Proposed Implementation

The following outlines a preliminary approach to implementing these enhancements:

Potential Challenges and Considerations

Request for Feedback

Feedback on the proposed enhancements, including any potential issues, alternative approaches, or considerations that might improve the implementation and utility of these new fields, is highly appreciated. Insights from developers with experience in PostgreSQL FDWs or similar technologies would be particularly valuable.

jcjc712 commented 7 months ago

Hi, Thank you for your proposal. We are thinking what is the best way to implement this feature, we could try to do it automatically when the scanning process is executed and/or enable the update table-description endpoint so the users can decide what columns should be set a required and add the operators. But feel free to develop your solution and contribute in this project if you wish. Besides the agent should know how to use this new data.