Dataherald / dataherald

Interact with your SQL database, Natural Language to SQL using LLMs
https://dataherald.readthedocs.io/en/latest/
Apache License 2.0
3.35k stars 234 forks source link

Getting Error if the password in connection_uri containing the @ sign or any extra characters for postgres #518

Open Harshitdy opened 4 months ago

Harshitdy commented 4 months ago

Issue Description:

Currently, while attempting to create connection using connection_uri for postgres database if password contains the "@" sign then it's throwing error.

{
  "error_code": "invalid_database_connection",
  "message": "Unable to connect to db: string",
  "description": "(psycopg2.OperationalError) could not translate host name \"g@c-kitjcuyubdb.fuwinjjrzax3i5.postgres.cosmos.azure.com\" to address: Name or service not known\n\n(Background on this error at: https://sqlalche.me/e/14/e3q8)",
  "detail": {
    "alias": "string",
    "use_ssh": false,
    "connection_uri": "gAAAAABmfUgccbOrIiubyfieUiutUYvtytbyiubvjuVrubyuvtuWghSRsxJe-eIVA1ezGLqPoXJaicYgOMVNEJgH-XFoi5F3-IZKI-EQwM-cSkLL3KqW9umFG4rukQUOkAUfO6qJt3WLhSjAea3094yYvNex-ppzJkPLfyoKDleWkejnKJ-hMlGqTh2q_oRIK-5QYFps4NicXQZfIJPHlDqYdt-4x56SX79hzsiE=",
    "path_to_credentials_file": "string",
    "llm_api_key": null,
    "ssh_settings": null,
    "file_storage": null
  }
}

Even if you try with "%40" at the place of "@" you will get the same error as above. It's because while decrypting the connection_uri, it's again converting "%40" into "@" and then sqlalchemy throw you error.

Proposed Solution:

Update from_uri method under class SQLDatabase in services/engine/dataherald/sql_database/base.py file:

from urllib.parse import unquote, quote_plus                     # import quote_plus also
...
...
    @classmethod
    def from_uri(
        cls, database_uri: str, engine_args: dict | None = None
    ) -> "SQLDatabase":
        """Construct a SQLAlchemy engine from URI."""
        _engine_args = engine_args or {}
        if database_uri.lower().startswith("duckdb"):
            config = {"autoload_known_extensions": False}
            _engine_args["connect_args"] = {"config": config}
        # database_pass = database_uri.rsplit('@', 1)[0].rsplit(":", 1)[1]         # this can also work
        database_pass = re.search(r':[^:]*:(.*)@[^@]*$', database_uri).group(1)          # newly added 
        _database_uri = database_uri.replace(database_pass, quote_plus(database_pass))   # newly added 
        engine = create_engine(_database_uri, **_engine_args)                     # using new _database_uri instead of database_uri
        return cls(engine)
...
...

this will first generate encoding of password using urllib.parse and replace the encoded password at the place of provided one.

Expected Outcome:

By implementing these lines we will be able to handle extra characters in password.

Preliminary Evaluation:

I've carried out initial assessments of the suggested modifications, tested the proposed solution using Docker as suggested in the CONTRIBUTING.md.

At present, I haven't initiated a pull request, opting instead to solicit feedback on the proposed solution as I believe there may be more efficient methods to achieve this.