When auth strategy is APIKey then in the logs we get something like:
Making GET request to http://gateway.marvel.com/v1/public/characters with params={'ts': 1716293455, 'hash': '19b8d956c8ed7795530ca7b28ce99cdd', 'offset': 0, 'limit': 20}
Is it intentionally excluding api_key query parameter (probably @burnash is interested in this)?
For sub-resources it fetches 5-10 records then fails with the exception below,,
For parent resource has the similar behavior as in #1
Traceback (most recent call last):
File "/Users/sultan/Projects/DLT/dlt-openapi/hackathon/marvel/pipeline.py", line 15, in <module>
info = pipeline.run(source)
^^^^^^^^^^^^^^^^^^^^
File "/Users/sultan/Projects/DLT/dlt-openapi/.venv/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 222, in _wrap
step_info = f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sultan/Projects/DLT/dlt-openapi/.venv/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 267, in _wrap
return f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sultan/Projects/DLT/dlt-openapi/.venv/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 673, in run
self.extract(
File "/Users/sultan/Projects/DLT/dlt-openapi/.venv/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 222, in _wrap
step_info = f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sultan/Projects/DLT/dlt-openapi/.venv/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 176, in _wrap
rv = f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sultan/Projects/DLT/dlt-openapi/.venv/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 162, in _wrap
return f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sultan/Projects/DLT/dlt-openapi/.venv/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 267, in _wrap
return f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sultan/Projects/DLT/dlt-openapi/.venv/lib/python3.11/site-packages/dlt/pipeline/pipeline.py", line 446, in extract
raise PipelineStepFailed(
dlt.pipeline.exceptions.PipelineStepFailed: Pipeline execution failed at stage extract when processing package 1716294336.313966 with exception:
<class 'dlt.extract.exceptions.ResourceExtractionError'>
In processing pipe get_character_individual: extraction of resource get_character_individual in generator paginate_dependent_resource caused an exception: 409 Client Error: Conflict for url: http://gateway.marvel.com/v1/public/characters/1011334?apikey=KEY
Incorrect pagination detection in generator for marvel source get_comics_collection resource, selects startYear although it is a regular offset-limit pagination (imo we need pagination priority list or something like it)
- description: Return only issues in series whose start year matches the input.
in: query
name: startYear
required: false
schema:
format: int32
type: integer
...SKIPPED...
- description: Limit the result set to the specified number of resources.
in: query
name: limit
required: false
schema:
format: int32
type: integer
- description: Skip the specified number of resources in the result set.
in: query
name: offset
required: false
schema:
format: int32
type: integer
Missing query params in params dictionary.
/v1/public/comics accepts startYear parameter, it presents in spec but wasn't added in the generator.
Generated parameters
{
"name": "get_comics_collection",
"table_name": "comics",
"endpoint": {
"data_selector": "$",
"path": "/v1/public/comics",
"params": {
# "format": "FILL_ME_IN", # TODO: fill in query parameter
# "formatType": "FILL_ME_IN", # TODO: fill in query parameter
# "noVariants": "FILL_ME_IN", # TODO: fill in query parameter
# "dateDescriptor": "FILL_ME_IN", # TODO: fill in query parameter
# "dateRange": "FILL_ME_IN", # TODO: fill in query parameter
# "title": "FILL_ME_IN", # TODO: fill in query parameter
# "titleStartsWith": "FILL_ME_IN", # TODO: fill in query parameter
# "issueNumber": "FILL_ME_IN", # TODO: fill in query parameter
# "diamondCode": "FILL_ME_IN", # TODO: fill in query parameter
# "digitalId": "FILL_ME_IN", # TODO: fill in query parameter
# "upc": "FILL_ME_IN", # TODO: fill in query parameter
# "isbn": "FILL_ME_IN", # TODO: fill in query parameter
# "ean": "FILL_ME_IN", # TODO: fill in query parameter
# "issn": "FILL_ME_IN", # TODO: fill in query parameter
# "hasDigitalIssue": "FILL_ME_IN", # TODO: fill in query parameter
# "modifiedSince": "FILL_ME_IN", # TODO: fill in query parameter
# "creators": "FILL_ME_IN", # TODO: fill in query parameter
# "characters": "FILL_ME_IN", # TODO: fill in query parameter
# "series": "FILL_ME_IN", # TODO: fill in query parameter
# "events": "FILL_ME_IN", # TODO: fill in query parameter
# "stories": "FILL_ME_IN", # TODO: fill in query parameter
# "sharedAppearances": "FILL_ME_IN", # TODO: fill in query parameter
# "collaborators": "FILL_ME_IN", # TODO: fill in query parameter
# "orderBy": "FILL_ME_IN", # TODO: fill in query parameter
# "offset": "FILL_ME_IN", # TODO: fill in query parameter
},
},
}
For some incrementals which use only parts of date like year, we don't have datetime formatting support, for example in the marvel source it has startYear parameter which could use the value from modified field from response/spec but it is a datetime value 2019-08-21T17:11:27-0400
- description: Return only issues in series whose start year matches the input.
in: query
name: startYear
required: false
schema:
format: int32
type: integer
Can we also detect and extract common paginators found across the endpoints in resource_defaults maybe this is to @sh-rp?
Documentation
Documentation regarding adding custom pagination, sub-resource configuration (we need more explanation on JSONPath selection data_selector and flavor we use, maybe give links to some reference) and custom authentication implementation could be more detailed, for example in the marvel source it is basically APIKey strategy but we need to pass addition query parameters like timestamp and a hash sum of keys.
This PR contains two generated rest api sources
pollenrapporten
,fakerestapi
andmarvel
For the first two APIs the generator worked out pretty much out of the box with minor naming adjustments and sub-resource selection adjustments.
Marvel API spec required much more work to get it working, below you can find errors and issues I overcame as I implemented it.
Related PR with mentioned specs https://github.com/dlt-hub/dlt-init-openapi/pull/42
TODO
Created issues
https://github.com/dlt-hub/dlt/issues/1388
Notes
With marvel API spec it generated more than selected endpoints And resource names for different endpoints were duplicated https://github.com/dlt-hub/dlt-openapi/assets/354868/2c4e4eab-19b4-4229-a146-0fdfabb088bc
When auth strategy is APIKey then in the logs we get something like:
Is it intentionally excluding api_key query parameter (probably @burnash is interested in this)?
Sample config:
Resource defaults are not respected
#1
Incorrect pagination detection in generator for marvel source
get_comics_collection
resource, selectsstartYear
although it is a regular offset-limit pagination (imo we need pagination priority list or something like it)Missing query params in
params
dictionary./v1/public/comics
acceptsstartYear
parameter, it presents in spec but wasn't added in the generator.Generated parameters
For some incrementals which use only parts of date like year, we don't have datetime formatting support, for example in the
marvel
source it hasstartYear
parameter which could use the value frommodified
field from response/spec but it is a datetime value2019-08-21T17:11:27-0400
Can we also detect and extract common paginators found across the endpoints in
resource_defaults
maybe this is to @sh-rp?Documentation
Documentation regarding adding custom pagination, sub-resource configuration (we need more
explanation on JSONPath selection, maybe give links to some reference) and custom authentication implementation could be more detailed, for example in thedata_selector
and flavor we usemarvel
source it is basicallyAPIKey
strategy but we need to pass addition query parameters like timestamp and a hash sum of keys.