dlt-hub / verified-sources

Contribute to dlt verified sources 🔥
https://dlthub.com/docs/walkthroughs/add-a-verified-source
Apache License 2.0
48 stars 38 forks source link

rest_api: Allow multiple resolve params in an endpoint config #477

Open willi-mueller opened 1 month ago

willi-mueller commented 1 month ago

dlt version

0.4.11

Source name

rest_api

Describe the problem

When trying to specify multiple resolve params I get the following exception:

demos-py3.11➜  demos git:(main) ✗ RUNTIME__LOG_LEVEL=INFO python pokemon_pipeline.py
Traceback (most recent call last):
  File "/Users/vilasa/code/demos/pokemon_pipeline.py", line 60, in <module>
    pokemon_source = rest_api_source(pokemon_config)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vilasa/code/demos/rest_api/__init__.py", line 89, in rest_api_source
    return decorated(config)
           ^^^^^^^^^^^^^^^^^
  File "/Users/vilasa/Library/Caches/pypoetry/virtualenvs/demos-C1MIco0Z-py3.11/lib/python3.11/site-packages/dlt/extract/decorators.py", line 243, in _wrap
    rv = conf_f(*args, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vilasa/code/demos/rest_api/__init__.py", line 162, in rest_api_resources
    ) = build_resource_dependency_graph(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vilasa/code/demos/rest_api/config_setup.py", line 195, in build_resource_dependency_graph
    raise ValueError(
ValueError: Multiple resolved params for resource multiple_resolves: [ResolvedParam(param_name='berry_name', resolve_config=ResolveConfig(resource_name='berries', field_path='name')), ResolvedParam(param_name='pokemon_name', resolve_config=ResolveConfig(resource_name='pokemon', field_path='name'))]

Expected behavior

I can have an endless count of resolve params

Steps to reproduce

import dlt

from rest_api import rest_api_source

pokemon_config = {
    "client": {
        "base_url": "https://pokeapi.co/api/v2/",
    },
    "resource_defaults": {
        "write_disposition": "replace",
        "endpoint": {
            "params": {
                "limit": 1000,
            },
        },
    },
    "resources": [
        {"name": "berries", "endpoint": {"path": "berry"}, "selected": False},
        "pokemon",
        {
            "name": "multiple_resolves",
            "endpoint": {
                "path": "foo/bar?first_resolve={berry_name}&second_resolve={pokemon_name}",
                "params": {
                    "berry_name": {
                        "type": "resolve",
                        "resource": "berries",
                        "field": "name",
                    },
                    "pokemon_name": {
                        "type": "resolve",
                        "resource": "pokemon",
                        "field": "name",
                    },
                },
            },
        },
    ],
}

pokemon_source = rest_api_source(pokemon_config)

pipeline = dlt.pipeline(
    pipeline_name="pokemon_pipeline",
    destination="duckdb",
    dataset_name="pokemon",
    progress="log",
)

load_info = pipeline.run(pokemon_source)
print(load_info)

How you are using the source?

I run this source in production.

Operating system

macOS

Runtime environment

Local

Python version

3.11.4

dlt destination

duckdb

Additional information

Reported here: https://dlthub-community.slack.com/archives/C04DQA7JJN6/p1716833336657899?thread_ts=1716665462.485049&cid=C04DQA7JJN6