dlt-hub / dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
https://dlthub.com/docs
Apache License 2.0
2.26k stars 147 forks source link

RESTClient does not use configured request timeout #1583

Closed TyrantKingBen closed 4 weeks ago

TyrantKingBen commented 1 month ago

dlt version

0.5.1

Describe the problem

RESTClient does not use the configured request_timeout nor does it have a way to make it use a timeout.

Expected behavior

The RESTClient should use the configured request_timeout.

Steps to reproduce

Use the following configuration.

[runtime]
log_level = "DEBUG"
request_timeout = 0.001
request_max_attempts = 1

Create a resource that makes an API request using RESTClient. Run a pipeline with the resource. The pipeline will complete, but it should fail due to request timing out and max retries being reached.

Operating system

Linux

Runtime environment

Virtual Machine

Python version

3.10

dlt data source

Custom API data source using RESTClient.

dlt destination

DuckDB

Other deployment details

WSL2

Additional information

By default RESTClient uses the Session requests helper. Session currently overrides the request method to set the default timeout. Session could also override the send method to set the default timeout.

    def send(self, request, **kwargs):
        kwargs.setdefault("timeout", self.timeout)

        return super().send(request, **kwargs)
rudolfix commented 1 month ago

@willi-mueller please take a look. we re-configure our built in session when config changes. look for test_init_default_client that is checking that

TyrantKingBen commented 1 month ago

A related issue is that you cannot provide any of the standard request parameters (proxies, stream, verify, cert, allow_redirects) through the RESTClient methods. Specifically, allow_redirects cannot be used at all (unlike the others which you can workaround by configuring session defaults).

RESTClient().get("https://example.com", allow_redirects=False) # errors

RESTClient().get("https://example.com", verify=False) # errors

RESTClient().session.verify = False
RESTClient().get("https://example.com") # verify is now False in this request

Something along the lines of the following code should allow passing these parameters while preserving the standard behavior of request/session settings merging.

    def _send_request(self, request: Request, **kwargs: Any) -> Response:
        logger.info(
            f"Making {request.method.upper()} request to {request.url}"
            f" with params={request.params}, json={request.json}"
        )

        prepared_request = self.session.prepare_request(request)

        send_kwargs = self.session.merge_environment_settings(
            prepared_request.url,
            kwargs.pop("proxies", {}),
            kwargs.pop("stream", None),
            kwargs.pop("verify", None),
            kwargs.pop("cert", None),
        )

        send_kwargs.update(**kwargs)

        return self.session.send(prepared_request, **send_kwargs)

    def request(self, path: str = "", method: HTTPMethod = "GET", **kwargs: Any) -> Response:
        prepared_request = self._create_request(
            path=path,
            method=method,
            params=kwargs.pop("params", None),
            json=kwargs.pop("json", None),
            auth=kwargs.pop("auth", None),
            hooks=kwargs.pop("hooks", None),
        )
        return self._send_request(prepared_request, **kwargs)

    # and add/use **kwargs in paginate
willi-mueller commented 1 month ago

Thank you @TyrantKingBen , that looks very helpful as a start. I'll look into it.

willi-mueller commented 1 month ago

Hi @TyrantKingBen, thank you for the report!

I can reproduce the problem and your fix works great!

I'll look into the allow_redirects etc. next

willi-mueller commented 1 month ago

Hi @TyrantKingBen I created this issue: https://github.com/dlt-hub/dlt/issues/1593. Could you please check if it matches your expectations? Thank you!

TyrantKingBen commented 1 month ago

Yes, #1593 looks good to me. Thanks @willi-mueller.