getsentry / responses

A utility for mocking out the Python Requests library.
Apache License 2.0
4.08k stars 347 forks source link

responses not mocking urls for concurrent.futures.ProcessPool #688

Closed GeeCastro closed 7 months ago

GeeCastro commented 8 months ago

Describe the bug

Reponses works with ThreadPoolExecutor but doesn't with ProcessPoolExecutor. I'm guessing it may be because the underlying process doesn't get the responses setup when spawned? But just in case I thought I'd ask if there was a solution for my tests.

Additional context

Here’s a minimal setup to reproduce the issue

https://github.com/Chichilele/reponses-thread-issue

Version of responses

0.23.3

Steps to Reproduce

from concurrent.futures import ProcessPoolExecutor
from concurrent.futures import ThreadPoolExecutor
import requests
import responses

@responses.activate
def test_get_threading():
    url = "http://test.org/get"
    responses.get(url, json={"status": "ok"})

    with ThreadPoolExecutor() as pool:
        r = pool.map(requests.get, [url, url])

    results = list(r)
    assert len(results) == 2
    assert results[0].json() == {"status": "ok"}
    assert results[1].json() == {"status": "ok"}

@responses.activate
def test_get_multiprocessing():
    url = "http://fake.org/get"
    responses.get(url, json={"status": "ok"})

    with ProcessPoolExecutor() as pool:
        r = pool.map(requests.get, [url, url])    # tries to reach non existant `http://test.org/get`

    results = list(r)
    assert len(results) == 2
    assert results[0].json() == {"status": "ok"}
    assert results[1].json() == {"status": "ok"}

if __name__ == "__main__":
    print("Running threading...")
    test_get_threading()
    print("Running multiprocessing...")
    test_get_multiprocessing()

runs with pytest or main and throws an error:

Details ``` python3.10 test_concurrent.py Running threading... Running multiprocessing... concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn conn = connection.create_connection( File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 72, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/socket.py", line 955, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno 8] nodename nor servname provided, or not known During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 714, in urlopen httplib_response = self._make_request( File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 415, in _make_request conn.request(method, url, **httplib_request_kw) File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 244, in request super(HTTPConnection, self).request(method, url, body=body, headers=headers) File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/http/client.py", line 1283, in request self._send_request(method, url, body, headers, encode_chunked) File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/http/client.py", line 1329, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/http/client.py", line 1278, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/http/client.py", line 1038, in _send_output self.send(msg) File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/http/client.py", line 976, in send self.connect() File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 205, in connect conn = self._new_conn() File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 186, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: : Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 798, in urlopen retries = retries.increment( File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='test.org', port=80): Max retries exceeded with url: /get (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/concurrent/futures/process.py", line 205, in _process_chunk return [fn(*args) for args in chunk] File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/concurrent/futures/process.py", line 205, in return [fn(*args) for args in chunk] File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/requests/api.py", line 73, in get return request("get", url, params=params, **kwargs) File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, **kwargs) File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, **send_kwargs) File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/requests/adapters.py", line 519, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='test.org', port=80): Max retries exceeded with url: /get (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known')) """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/gauthiercastro/code/ml-sc-training-pipeline/tests/test_concurrent.py", line 35, in test_get_multiprocessing() File "/Users/gauthiercastro/code/ml-sc-training-pipeline/.venv/lib/python3.10/site-packages/responses/__init__.py", line 229, in wrapper return func(*args, **kwargs) File "/Users/gauthiercastro/code/ml-sc-training-pipeline/tests/test_concurrent.py", line 14, in test_get_multiprocessing results = list(r) File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/concurrent/futures/process.py", line 575, in _chain_from_iterable_of_lists for element in iterable: File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator yield _result_or_cancel(fs.pop()) File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel return fut.result(timeout) File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/concurrent/futures/_base.py", line 451, in result return self.__get_result() File "/Users/gauthiercastro/.pyenv/versions/3.10.12/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result raise self._exception requests.exceptions.ConnectionError: None: Max retries exceeded with url: /get (Caused by None) ```

Expected Result

Tests pass, mocking urls for both ThreadPoolExecutor and ProcessPoolExecutor

Actual Result

Mocking urls for both ThreadPoolExecutor but reaches actual website with ProcessPoolExecutor

beliaev-maksim commented 7 months ago

@Chichilele


============================= test session starts ==============================
collecting ... collected 2 items

issue.py::test_get_threading PASSED                                      [ 50%]
issue.py::test_get_multiprocessing PASSED                                [100%]

can you find another reproducible ? I run on python 3.11

GeeCastro commented 7 months ago

python 3.8

I don’t have access to a laptop this week but found something interesting. See this GitHub action run with macOS fails https://github.com/Chichilele/reponses-thread-issue/actions/runs/6865230936/job/18668751152

but the previous one with Ubuntu passed! https://github.com/Chichilele/reponses-thread-issue/actions/runs/6865141529

beliaev-maksim commented 7 months ago

@Chichilele so you assume that is OS specific ? could it be python bug then ?

GeeCastro commented 7 months ago

The test above indicates it’s OS specific indeed. I wouldn’t know where it comes from though. Maybe python, maybe another package called by responses to mock the http calls

beliaev-maksim commented 7 months ago

@Chichilele I just forked your repo and confirm that changing OS from macos to ubuntu resolves the issue

can you simplify your example further down and remove poetry? I see that this installs other sets of dependencies for poetry and not sure if poetry isolates environment good enough

GeeCastro commented 7 months ago

Good point I’ve tried here without poetry. Pip install requests, responses and pytest then running pytest

https://github.com/Chichilele/reponses-thread-issue/actions/runs/6867570438/job/18676216103

beliaev-maksim commented 7 months ago

I did a debug on MacOS VM

we do patch the adapter.send() from responses. However, when we reach to the execution point, requests asks to send via adapter and then adapter send is the original one.

same happens when we use context manager.

I think it is some bug on Mock from python, or python itself. Our function patch doesn't propagate across processes on MacOS

@markstory have you even encountered smth like this ?

bblommers commented 7 months ago

@beliaev-maksim This may be because of how processed are started slightly different on MacOS, vs Linux - see this answer on SO: https://stackoverflow.com/a/70440892/13245310

And some info on how to change the start_method for a specific ProcessPoolExecutor: https://superfastpython.com/processpoolexecutor-multiprocessing-context/

beliaev-maksim commented 7 months ago

@bblommers thanks a lot for the links. That indeed explains, if the process is executed in a separate interpreter, then there is no way we can catch it.

from the docs follows that this issue could be solved from the user side, but it feels like only on MacOS, meaning that we will not be able to solve it on Windows

that also might be different in the sequence of execution. In the provided example we first do the mock, then create a pool. What if we create a mock inside of the pool, then each copy of the interpreter gets its own responses. It will mess up things like call_counts (since they will not be global) but should intercept all the requests

anyway, nothing that could be done from responses side.

@Chichilele if you would like to contribute documentation on how to solve this issue from user side, that would be great

markstory commented 7 months ago

have you even encountered smth like this ?

I have not, but having mocks cross process boundaries (with either fork or spawning) is not something I would expect a python library to take care of either.