dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
2.59k stars 173 forks source link

`dlt.secrets.get(...)` returns None loading configuration from `.env` #1768

Open devcovato opened 2 months ago

devcovato commented 2 months ago

dlt version

0.5.4

Describe the problem

I'm using dlt with dagster. I created a sql_database source targeting postgres, and I'm using the file .env to load the configuration. For one of two assets (dlt pipeline), I have to change the source credentials database at runtime. I read the secrets as credentials = dlt.secrets.get("sources.sql_database.credentials") and then credentials["database"] = f"db{context.partition_key}". Doing so, I got the exception message TypeError: 'NoneType' object does not support item assignment. I created the file .dlt/secrets.toml in my root project and everything works. From the community, I got the suggestion dlt.secrets["sources.sql_database.credentials.database"] = f"db{context.partition_key}" and didn't work either. It looks like dlt.secrets is read-only.

Expected behavior

I should use only .env to set up the secrets and be able to update the configuration at runtime.

Steps to reproduce

Below are steps to reproduce the issue:

Operating system

macOS

Runtime environment

Docker, Docker Compose

Python version

3.11

dlt data source

sql_database

dlt destination

Postgres

Other deployment details

No response

Additional information

I'm using Python 3.12

neuromantik33 commented 1 month ago

I can confirm this behavior as the unit tests for pg_replication do not work. Here is an example:


(dlt-verified-sources-py3.10) drnick@drnick-XPS-15-7590:~/repos/github/neuromantik33/verified-sources/tests $ cat ../.env 
SOURCES__PG_REPLICATION__CREDENTIALS__DATABASE=dlt_data
SOURCES__PG_REPLICATION__CREDENTIALS__PASSWORD=loader
SOURCES__PG_REPLICATION__CREDENTIALS__HOST=localhost
SOURCES__PG_REPLICATION__CREDENTIALS__DRIVERNAME=postgresql
SOURCES__PG_REPLICATION__CREDENTIALS__USERNAME=loader
ALL_DESTINATIONS=["duckdb"]
(dlt-verified-sources-py3.10) drnick@drnick-XPS-15-7590:~/repos/github/neuromantik33/verified-sources/tests $ set -a
(dlt-verified-sources-py3.10) drnick@drnick-XPS-15-7590:~/repos/github/neuromantik33/verified-sources/tests $ source ../.env 
(dlt-verified-sources-py3.10) drnick@drnick-XPS-15-7590:~/repos/github/neuromantik33/verified-sources/tests $ set +a
(dlt-verified-sources-py3.10) drnick@drnick-XPS-15-7590:~/repos/github/neuromantik33/verified-sources/tests $ env | grep SOURCES
SOURCES__PG_REPLICATION__CREDENTIALS__DATABASE=dlt_data
SOURCES__PG_REPLICATION__CREDENTIALS__PASSWORD=loader
SOURCES__PG_REPLICATION__CREDENTIALS__HOST=localhost
SOURCES__PG_REPLICATION__CREDENTIALS__DRIVERNAME=postgresql
SOURCES__PG_REPLICATION__CREDENTIALS__USERNAME=loader
(dlt-verified-sources-py3.10) drnick@drnick-XPS-15-7590:~/repos/github/neuromantik33/verified-sources/tests $ env | grep ALL
DEB_PYTHON_INSTALL_LAYOUT=deb
ALL_DESTINATIONS=[duckdb]
(dlt-verified-sources-py3.10) drnick@drnick-XPS-15-7590:~/repos/github/neuromantik33/verified-sources/tests $ env | grep DESTINATION__POSTGRES__CREDENTIALS
DESTINATION__POSTGRES__CREDENTIALS=postgresql://loader:loader@localhost:5432/dlt_data
(dlt-verified-sources-py3.10) drnick@drnick-XPS-15-7590:~/repos/github/neuromantik33/verified-sources/tests $ python
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dlt
>>> dlt.secrets
<dlt.common.configuration.accessors._SecretsAccessor object at 0x78812b2092a0>
>>> dlt.secrets.get("sources.pg_replication.credentials")
>>> from dlt.common.configuration.specs import ConnectionStringCredentials
>>> dlt.secrets.get("sources.pg_replication.credentials", ConnectionStringCredentials)
>>> from dlt.destinations.impl.postgres.configuration import PostgresCredentials
>>> dlt.secrets.get("sources.pg_replication.credentials", PostgresCredentials)
>>> import os
>>> {k:v for k,v in os.environ.items() if k.startswith("SOURCES")}
{'SOURCES__PG_REPLICATION__CREDENTIALS__DATABASE': 'dlt_data', 'SOURCES__PG_REPLICATION__CREDENTIALS__PASSWORD': 'loader', 'SOURCES__PG_REPLICATION__CREDENTIALS__HOST': 'localhost', 'SOURCES__PG_REPLICATION__CREDENTIALS__DRIVERNAME': 'postgresql', 'SOURCES__PG_REPLICATION__CREDENTIALS__USERNAME': 'loader'}
>>> 
sh-rp commented 1 month ago

@devcovato Are you reading the secrets with dlt.secrets.get only for debugging purposes? Or are you using this in your production code too? Can you post your secrets toml (without the variables) and the env file so I can have a look?

sh-rp commented 1 month ago

@neuromantik33 dlt.secrets.get will access a single value in your secrets and not resolve whole groups. If you want your example to work you'd need to set the full connection string as the credentials argument:

import dlt
import os

from dlt.common.configuration.specs import ConnectionStringCredentials

os.environ["SOURCES__PG_REPLICATION__CREDENTIALS"] = "postgresql://loader:loader@localhost:5432/dlt_data"
print(dlt.secrets.get("sources.pg_replication.credentials", ConnectionStringCredentials))
devcovato commented 1 month ago

Hi @sh-rp,

@devcovato Are you reading the secrets with dlt.secrets.get only for debugging purposes? Or are you using this in your production code too?

I don't read for debugging purposes. You can see at the top that I change the credentials at runtime. The source is a multi-tenant using an isolated database per tenant.

Below is the file secrets.toml (redacted):

[sources.sql_database.credentials]
drivername = "mysql+pymysql"
database = "mycompany"
username = "user"
password = "password"
host = "host"
port = "3306"

And here is the file .env:

DAGSTER_ENVIRONMENT=dev
DESTINATION__POSTGRES__CREDENTIALS__CONNECT_TIMEOUT=15
DESTINATION__POSTGRES__CREDENTIALS__DATABASE=analytics_dev
DESTINATION__POSTGRES__CREDENTIALS__HOST=localhost
DESTINATION__POSTGRES__CREDENTIALS__PASSWORD=postgres
DESTINATION__POSTGRES__CREDENTIALS__PORT=5432
DESTINATION__POSTGRES__CREDENTIALS__USERNAME=postgres
LOAD__TRUNCATE_STAGING_DATASET=true
RUNTIME__DLTHUB_TELEMETRY=false
RUNTIME__LOG_LEVEL=WARNING
SOURCES__SQL_DATABASE__CREDENTIALS__DATABASE=mycompany
SOURCES__SQL_DATABASE__CREDENTIALS__DRIVERNAME='mysql+pymysql'
SOURCES__SQL_DATABASE__CREDENTIALS__HOST=host
SOURCES__SQL_DATABASE__CREDENTIALS__PASSWORD='password'
SOURCES__SQL_DATABASE__CREDENTIALS__POST=3306
SOURCES__SQL_DATABASE__CREDENTIALS__USERNAME=user

Can you post your secrets toml (without the variables) and the env file so I can have a look?

Did you mean env vars?

sh-rp commented 4 days ago

@devcovato did you manage to solve this?

devcovato commented 4 days ago

Hi @sh-rp,

@devcovato did you manage to solve this?

The issue is still present. As a workaround I created the file <root_project>/.dlt/secrets.toml.

sh-rp commented 2 days ago

@devcovato If you are setting env values you want to have change, you'll have to overwrite the env vars in code. Check out this snippet:

import dlt
import os

if __name__ == "__main__":

    os.environ["SOURCES__SQL_DATABASE__CREDENTIALS__DATABASE"] = "test1"

    os.environ["SOURCES__SQL_DATABASE__CREDENTIALS__USER"] = "test2"
    os.environ["SOURCES__SQL_DATABASE__CREDENTIALS__PASSWORD"] = "test3"
    os.environ["SOURCES__SQL_DATABASE__CREDENTIALS__HOST"] = "test4"
    os.environ["SOURCES__SQL_DATABASE__CREDENTIALS__PORT"] = "test5"

    # will not work
    print(dlt.secrets.get("sources.sql_database.credentials"))

    # will work
    print(dlt.secrets.get("sources.sql_database.credentials.database"))

    # this will not change the value, but it probably should
    dlt.secrets["sources.sql_database.credentials.database"] = "new_value"
    print(dlt.secrets.get("sources.sql_database.credentials.database"))

    # overwrite the env var will work
    os.environ["SOURCES__SQL_DATABASE__CREDENTIALS__DATABASE"] = "new_value"
    print(dlt.secrets.get("sources.sql_database.credentials.database"))
devcovato commented 6 hours ago

Hi @sh-rp,

I work on a different task now. I will try in the next few days and let you know.