encode / httpx

A next generation HTTP client for Python. 🦋
https://www.python-httpx.org/
BSD 3-Clause "New" or "Revised" License
12.76k stars 814 forks source link

Support async file types in `files = {}` and `content = ...` #1620

Open tomchristie opened 3 years ago

tomchristie commented 3 years ago

We ought to support the following cases.

Raw upload content from an async file interface:

import httpx
import trio

async def main():
    async with httpx.AsyncClient() as client:
        async with await trio.open_file(...) as f:
            client.post("https://www.example.com", content=f)

trio.run(main)

Multipart file upload from an async file interface:

import httpx
import trio

async def main():
    async with httpx.AsyncClient() as client:
        async with await trio.open_file(...) as f:
            client.post("https://www.example.com", file={"upload": f})

trio.run(main)

We probably want to ensure that we're supporting both trio, anyio (Which have the same interfaces), and perhaps also `aiofiles. So eg, also supporting the following...

# Supporting the same as above but using `asyncio`, with `anyio` for the file operations.
import anyio
import asyncio
import httpx

async def main():
    async with httpx.AsyncClient() as client:
        async with await anyio.open_file(...) as f:
            client.post("https://www.example.com", content=f)

asyncio.run(main())

The content=... case is a little simpler than the data=... case, since it really just need an async variant of peek_filelike_length, and a minor update to the ._content.encode_content() function.

Also fiddly is what the type annotations ought to look like.

Mayst0731 commented 3 years ago

Hi, I'm interested in this issue and also found it in the ._content.encode_content() function, the first thing need to do is to find the type of trio and anyio. Could I have a try for getting it done? :D

Mayst0731 commented 3 years ago

Ohhh, like you said its multipart issue seems like aiohttp working with multipart as well.
("https://docs.aiohttp.org/en/stable/multipart.html")

ajayd-san commented 3 years ago

hey, @meist0731 are you still working on this issue ? If yes, can we work it together. Seems like an interesting problem

Mayst0731 commented 3 years ago

hey, @meist0731 are you still working on this issue ? If yes, can we work it together. Seems like an interesting problem

Yep! I'm still working on this. It's my honor to work together with you :DDD I'm going to sleep soon and tomorrow I will share the materials I've searched before.

ajayd-san commented 3 years ago

@meist0731 cheers, you on discord ? itll be easier to work together.

Mayst0731 commented 3 years ago

@meist0731 cheers, you on discord ? itll be easier to work together. my id - Krunchy_Almond#2794

Gotcha! I have discord, wait a sec, bro.

Mayst0731 commented 3 years ago

@meist0731 cheers, you on discord ? itll be easier to work together. my id - Krunchy_Almond#2794

Hey, I've sent the invitation :D

ajayd-san commented 3 years ago

@tomchristie how do you recommend to proceed this issue? Like can you explain where to start and all ?

Mayst0731 commented 3 years ago

I've tried these APIs as below,

async def main1():
    async with await anyio.open_file('./content.txt','rb') as f:
        await client.post("https://www.example.com", content=f)
anyio.run(main1)
async def main2():
    async with await trio.Path('./content.txt').open('rb') as f:
            await client.post("https://www.example.com", content=f)
trio.run(main2)
async def main3():
    async with aiofiles.open('./content.txt', mode='rb') as f:
        await client.post("https://www.example.com", content=f)      
asyncio.run(main3())

The multipart upload:

async def main5():
    async with httpx.AsyncClient() as client:
        async with await anyio.open_file('./content.txt','rb') as f:
            await client.post("https://www.example.com", files={"upload": f})
anyio.run(main5)

The problems here are: (1) The above functions only support read files in "rb" mode instead of 'r', otherwise it will give a TypeError saying "sequence item 1: expected a bytes-like object, str found", but I haven't figure out which part of code is handling this. (2) I've tested the above functions with a text file, it proved that no matter reading a file in a sync way or async way using trio, anyio or aiofiles, the .peek_filelike_length function can get the file's length correctly. However, when it comes to multipart upload, the error shows "AsyncIOWrapper"/"AsyncFile"/"AsyncBufferedReader" (Asynciterable object) is not iterable. It seems because the iteration functions here are all sync functions instead of async functions cannot support iterate an Asynciterable object but still receive Asynciterable objects, except for the final one is an async function

For example, this function accepts AsyncIterable objects but cannot perform iterations indeed. is not an async function.

Mayst0731 commented 3 years ago

The first problem has an existing discussion https://github.com/encode/httpx/discussions/1704#discussion-3421862

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

pawamoy commented 2 years ago

Argh, stale, wontfix, nooo 😱 !

Just want to make sure: uploading a file and data as multipart is still not supported by the async client, right? I'm getting the Attempted to send an sync request with an AsyncClient instance. error message when trying to do such a thing.

import httpx
from aiofiles import open as aopen
async with aopen("somefile.zip", "rb") as fp, httpx.AsyncClient() as client:
    files = {"content": ("somefile.zip", fp, "application/octet-stream")}
    response = await client.post(
        "http://localhost:8888",
        data=data_to_send,
        files=files,
        follow_redirects=False,
        headers={"Content-Type": f"multipart/form-data; boundary={uuid4().hex}"},
    )
florimondmanca commented 2 years ago

@pawamoy No fix was implemented AFAIK. This seems like an issue stalebot closed due to lack of inactivity, rather than us deciding it shouldn't be acted upon. I guess we can reopen (stalebot would come back in a few months) and any attempts towards supporting the interfaces described in OP (trio, anyio, aiofiles) would be welcome!

reclosedev commented 1 year ago

While the issue is not resolved, I'm using following monkey-patch, maybe it will be helpful:

"""
This is workaround monkey-patch for https://github.com/encode/httpx/issues/1620

If you need to upload async stream as a multipart `files` argument, you need to apply this patch
and wrap stream with `AsyncStreamWrapper`::

    httpx_monkeypatch.apply()
    ...

    known_size = 42
    stream = await get_async_bytes_iterator_somehow_with_known_size(known_size)
    await client.post(
        'https://www.example.com',
        files={'upload': AsyncStreamWrapper(stream, known_size)})
    )
"""
import typing as t
from asyncio import StreamReader

from httpx import _content
from httpx._multipart import FileField
from httpx._multipart import MultipartStream
from httpx._types import RequestFiles

class AsyncStreamWrapper:
    def __init__(self, stream: t.Union[t.AsyncIterator[bytes], StreamReader], size: int):
        self.stream = stream
        self.size = size

class AsyncAwareMultipartStream(MultipartStream):

    def __init__(self, data: dict, files: RequestFiles, boundary: bytes = None) -> None:
        super().__init__(data, files, boundary)
        for field in self.fields:
            if isinstance(field, FileField) and isinstance(field.file, AsyncStreamWrapper):
                field.get_length = lambda f=field: len(f.render_headers()) + f.file.size  # type: ignore # noqa: E501

    async def __aiter__(self) -> t.AsyncIterator[bytes]:
        for field in self.fields:
            yield b'--%s\r\n' % self.boundary
            if isinstance(field, FileField) and isinstance(field.file, AsyncStreamWrapper):
                yield field.render_headers()
                async for chunk in field.file.stream:
                    yield chunk
            else:
                for chunk in field.render():
                    yield chunk
            yield b'\r\n'
        yield b'--%s--\r\n' % self.boundary

def apply():
    _content.MultipartStream = AsyncAwareMultipartStream
and3rson commented 1 year ago

Has there been any progress on this by any chance?

lambdaq commented 1 year ago

I am also using files={"upload": f} where f is a multi-part async file upload from FastAPI.

It says TypeError: object of type 'coroutine' has no len() which kills me. The file is quite large I hope it gets handled in a streamable way.

tomchristie commented 1 year ago

If anyone is invested in making this happen I can make the time to guide a pull request through.

lambdaq commented 1 year ago

I am also using files={"upload": f} where f is a multi-part async file upload from FastAPI.

I solved this problem, for FastAPI. When reading a uploaded file in a form, FastAPI wraps a SpooledTemporaryFile into async style.

To access the file with httpx, the async doesn't fit, but you can use the old-fasioned way, just change

httpx.post(..., files={"upload": f})

into

httpx.post(..., files={"upload": f.file})

yayahuman commented 1 year ago

A monkey patch showing a possible solution (also cancels #1706 and covers #2399):

https://gist.github.com/yayahuman/db06718ffdf8a9b66e133e29d7d7965f

And possible type annotations:

from abc import abstractmethod
from typing import AnyStr, AsyncIterable, Iterable, Protocol, Union  # 3.8+

class Reader(Protocol[AnyStr]):
    __slots__ = ()

    @abstractmethod
    def read(self, size: int = -1) -> AnyStr:
        raise NotImplementedError

class AsyncReader(Protocol[AnyStr]):
    __slots__ = ()

    @abstractmethod
    async def read(self, size: int = -1) -> AnyStr:
        raise NotImplementedError

FileContent = Union[
    str,
    bytes,
    Iterable[str],
    Iterable[bytes],
    AsyncIterable[str],
    AsyncIterable[bytes],
    Reader[str],
    Reader[bytes],
    AsyncReader[str],
    AsyncReader[bytes],
]

RequestContent = FileContent
yayahuman commented 1 year ago

@tomchristie, can my monkey patch approach be acceptable?

tomchristie commented 1 year ago

Let me help guide this conversation a bit more clearly. I would probably suggest starting by just looking at the content=... case. A good starting point for a pull request would be a test case for that one case, which demonstrates the behaviour we'd like to see.

dkbarn commented 2 weeks ago

What's the status of this ticket? There were a couple people actively working toward a PR several years ago, and it's unclear what happened with that work. Since then, conversation has revolved entirely around monkeypatches, temp fixes, and workarounds.

Is anyone actively looking at the proper solution within httpx itself?

I am also in need of using aiofiles async streams with multipart uploading.

tomchristie commented 2 weeks ago

What's the status of this ticket?

https://github.com/encode/httpx/issues/1620#issuecomment-1630463450

dkbarn commented 2 weeks ago

OK thanks, since that comment was a year ago I guess the answer is that the status of this ticket is low priority and will not be done by the maintainer.

Mayst0731 commented 2 weeks ago

i see, it would be better if giving more context/details about your train of thoughts of this content case? thanks~~ @tomchristie

tomchristie commented 2 weeks ago

I guess the answer is that the status of this ticket is low priority and will not be done by the maintainer.

I'd suggest a good approach would be that I reciprocate effort by guiding contributors through the process.