duckdb / duckdb_aws

MIT License
34 stars 12 forks source link

call load_aws_credentials() in a docker container doesn't load credentials #38

Open pratio opened 2 months ago

pratio commented 2 months ago

What happens?

When running duckdb inside a docker container, the function call load_aws_credentials(); gives empty rows. Running it outside the container works.

To Reproduce

Setup 1 (Local machine)

  1. Install duckdb with brew install duckdb

  2. Configure aws cli (in my case sso)

  3. check with aws s3 ls that you can list the buckets in the chosen profile

  4. List files in the chosen bucket

  5. Open duckdb

  6. In the the duckdb console run call load_aws_credentials();

  7. It should display the secrets

  8. Run select * from read_csv(s3://bucket_path)

  9. It works Sample output of call load_aws_credentials(); locally

    ┌──────────────────────┬──────────────────────────┬──────────────────────┬───────────────┐
    │ loaded_access_key_id │ loaded_secret_access_key │ loaded_session_token │ loaded_region │
    │       varchar        │         varchar          │       varchar        │    varchar    │
    ├──────────────────────┼──────────────────────────┼──────────────────────┼───────────────┤
    │   access_key_id      │  secret key              │       session token  │ region        │
    └──────────────────────┴──────────────────────────┴──────────────────────┴───────────────

    Step 2: Docker container

  10. Build a docker image

  11. Install duckdb, boto3, awscli, also the duckdb binary

  12. Run the docker container with your local .aws directory mounted to the one in the container

  13. Inside the docker container, run aws s3 ls you might need to do an export AWS_PROFILE=profilename

  14. If you can see the buckets listed, it means that credentials sharing between the container and host works fine

  15. run duckdb with ./duckdb

  16. Run call load_aws_credentials();

  17. It doesn't show anything, even after specifying the name of the profile with call load_aws_profile('profilename')

  18. Running it from python also does't work

Dockerfile

FROM python:3.11-slim
RUN pip install poetry --no-cache-dir
RUN apt update && apt install awscli wget unzip -y
RUN wget https://github.com/duckdb/duckdb/releases/download/v0.10.1/duckdb_cli-linux-amd64.zip
RUN mkdir /app
RUN unzip duckdb_cli-linux-amd64.zip -d /app
RUN rm duckdb_cli-linux-amd64.zip
WORKDIR /app

Build the container

docker build -t duckdb .

Command to run the container locally

docker run -it -v $(pwd):/app -v /Users/home/.aws:/root/.aws   duckdb /bin/bash 

once inside the container, run duckdb with

./duckdb and then run call load_aws_credentials()

They're empty as it is shown here

As it shows here, running the command inside the container creates an empty table like this

┌──────────────────────┬──────────────────────────┬──────────────────────┬───────────────┐
│ loaded_access_key_id │ loaded_secret_access_key │ loaded_session_token │ loaded_region │
│       varchar        │         varchar          │       varchar        │    varchar    │
├──────────────────────┼──────────────────────────┼──────────────────────┼───────────────┤
│                      │                          │                      │ region        │
└──────────────────────┴──────────────────────────┴──────────────────────┴───────────────

Since the .aws directory is shared, the credentials are there and with the awscli installed in the container, you can check if they're valid with

aws s3 ls

What could it be?

I think it might've something to do with permissions on the files in .aws, on how these credentials are accessed. Unfortunately, I don't know how to debug or trace what happens when the function is called for.

OS:

x64 docker, Mac OS aarch64 host

DuckDB Version:

v0.10.1 4a89d97db8

DuckDB Client:

Bash, Python

Full Name:

Pratyay Modi

Affiliation:

None

Have you tried this on the latest nightly build?

I have tested with a release build (and could not test with a nightly build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

samansmink commented 3 weeks ago

Hi @pratio, load_aws_credentials() function is now deprecated, could you try out the secret based flow and retry?

pratio commented 1 week ago

@samansmink I'll try