InfuseAI / piperider

Code review for data in dbt
https://www.piperider.io/
Apache License 2.0
478 stars 23 forks source link

Cannot execute piperider run on view model #886

Closed irisschen closed 11 months ago

irisschen commented 11 months ago

Describe the bug Running into Error while executing piperider run in the view model with source file from an AWS S3 bucket .

Screenshots

❯ dbt run
08:59:46  Running with dbt=1.5.1
08:59:47  Found 1 model, 2 tests, 0 snapshots, 0 analyses, 315 macros, 0 operations, 0 seed files, 1 source, 0 exposures, 0 metrics, 0 groups
08:59:47
08:59:47  Concurrency: 4 threads (target='dev')
08:59:47
08:59:47  1 of 1 START sql view model main.interaction ................................... [RUN]
08:59:47  1 of 1 OK created sql view model main.interaction .............................. [OK in 0.12s]
08:59:47
08:59:47  Finished running 1 view model in 0 hours 0 minutes and 0.25 seconds (0.25s).
08:59:47
08:59:47  Completed successfully
08:59:47
08:59:47  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

❯ piperider run
DataSource: dev
───────────────────────────────────────────────────── Validating ─────────────────────────────────────────────────────
everything is OK.
────────────────────────────────────────────────── Collect metadata ──────────────────────────────────────────────────
[0/0] METADATA    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   2/2 0:00:00
───────────────────────────────────────────────── Profile statistics ─────────────────────────────────────────────────
[1/1] interaction ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0/5 0:00:07
Error: Profiler Exception: DBAPIError('(duckdb.Error) Invalid Error: HTTP Error: HTTP GET error on 'https://kks-trc-workspace.s3.amazonaws.com/interaction_log.csv' (HTTP 403)
[SQL: SELECT count(*) AS count_1
FROM main.interaction]
(Background on this error at: https://sqlalche.me/e/20/dbapi)')
[1/1] interaction ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0/5 0:00:08%

Desktop (please complete the following information):

popcornylu commented 11 months ago

Thans for your report. Since in the view model, PipeRider uses the DuckDB client to connect directly to S3, S3 credentials are required.

Currently, PipeRider does not support S3 settings for dbt-duckdb profiles. However, you can pass credentials to the DuckDB client using S3 environment variables. Here's a example

export AWS_DEFAULT_REGION=ap-northeast-1
export AWS_ACCESS_KEY_ID=<id>
export AWS_SECRET_ACCESS_KEY=<secret>
piperider run

profiles.yml. If you export these envariable, dbt run can still work

jaffle_shop:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: jaffle_shop.duckdb
      extensions:
        - httpfs      

my source defintion

sources:
  - name: external_source
    tables:
      - name: eorders
        meta:
          external_location: "read_csv('s3://piperider-athena-test/data/raw_orders.csv', AUTO_DETECT=TRUE)"
          formatter: oldstyle     

The view model

select * from {{ source('external_source', 'eorders') }}
irisschen commented 11 months ago

It works after adding S3 environment variables, thanks!