aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
15.04k stars 2.01k forks source link

Allow URLs with paths as base_url in ClientSession #6647

Open thewhaleking opened 2 years ago

thewhaleking commented 2 years ago

Is your feature request related to a problem?

Many sites have an API path in the URL. For example, Jira uses {jira_url}/rest/api/2/ as the base for its API requests. The current implementation of the ClientSession only allows absolute URLs without a path. Thus, to make a number of Jira requests, you would do something like:

async with aiohttp.ClientSession(base_url="https://jira.com") as client:
    baz = await client.get("/rest/api/2/issue/ISSUE-1")
    bom = await client.get("/rest/api/2/project/ISSUE")

instead of the simpler

async with aiohttp.ClientSession(base_url="https://jira.com/rest/api/2") as client:
    baz = await client.get("/issue/ISSUE-1")
    bom = await client.get("/project/ISSUE")

This somewhat defeats the purpose of having the base_url in the first place.

Describe the solution you'd like

Allow using paths in the base_url, similar to how Requests and HTTPX do.

Describe alternatives you've considered

Just continue typing in the path or using variables with f-strings.

Related component

Client

Additional context

No response

Code of Conduct

hhromic commented 2 years ago

+1

I was also expecting base_url to allow for any base URL, not just absolute. Moreover, I just tried using a URL object for base_url, and while the assertion doesn't trigger, it silently ignores the path part.

For example, this snippet makes a request to http://service:8080/orders/1 (wrongly) without any errors:

import asyncio
from aiohttp import ClientSession
from yarl import URL

async def test():
    base_url = URL("http://service:8080/api/v2")
    async with ClientSession(base_url=base_url) as session:
        async with session.get("/orders/1") as response:
            body = await response.text()

asyncio.run(main())
buzanovn commented 2 years ago

+1

This is the default behavior of urljoin from urllib.parse when the joined part starts with a slash, otherwise the url is joined as we want it, but path that does not start with slash is restricted by design here

https://github.com/aio-libs/aiohttp/blob/cc6dc0c49f5d002485f9a3cdf9bc3127a3ac1388/aiohttp/client.py#L355-L361

I think that a removal of second assert condition makes the deal, and it does not violate the RFC as stated in base_url feature PR discussion https://github.com/aio-libs/aiohttp/pull/6129#discussion_r739634394

bogdan-coman-uv commented 1 year ago

+1 that would be really usefull

andreaevooq commented 1 year ago

+1

AYMENJD commented 1 year ago

Why is this not possible till now?

Dreamsorcerer commented 1 year ago

It would appear to be to get the feature merged in a timely manner without considering the extra edge cases:

We can improve the feature support in future https://github.com/aio-libs/aiohttp/pull/6129#issuecomment-955767200

https://github.com/aio-libs/aiohttp/pull/6129#discussion_r738566850

mokko commented 1 year ago

I also need base_url to include paths. In the meantime, can we document current behavior better? Do you want me to attempt an PR for the documentation?

Dreamsorcerer commented 1 year ago

Yep, that'd be great.

mokko commented 1 year ago

Thanks for taking my PR to improve the documentation.

Although perhaps obvious, let me just say: the workaround is not to use ClientSession's base_url, but to construct your urls yourself. In my case that leads to somewhat ugly code, but is simple enough to do.

Dreamsorcerer commented 3 months ago

So, from a discussion on yarl and reading the spec, I think the main complication for implementing this (and probably the reason it was skipped initially) is the behaviour of joining the URLs. This may also trip people up who are expecting URLs to be joined differently, as evidenced by comments above in this thread:

    base_url = URL("http://service:8080/api/v2")
    async with ClientSession(base_url=base_url) as session:
        async with session.get("/orders/1") as response:

The above code (if paths were supported) would make a request to http://service:8080/api/orders/1, but the user is probably expecting the request to go to http://service:8080/api/v2/orders/1 and should have used http://service:8080/api/v2/ as the base_url. https://datatracker.ietf.org/doc/html/rfc3986.html#section-5.2.3

If this is the only issue, then maybe we should add support for paths only when they end with a /, to avoid user confusion.