betamaxpy / betamax

A VCR imitation designed only for python-requests.
https://betamax.readthedocs.io/en/latest/
Other
565 stars 62 forks source link

does switching to betamax in my test suite remove the need for a test server? #153

Closed havok2063 closed 6 years ago

havok2063 commented 6 years ago

I have a somewhat complicated test suite for my project that I'm hoping to simplify by switching to betamax. In our software, we have a bunch of python tools which internally uses requests to make calls to our server, in addition to other code which calls Flask routes. Our test suite uses a local test server and database so we don't bother the real thing. However, this is creating issues with Travis builds and runtime. I was hoping to move to Betamax and eliminate the need entirely for a test server and database, but it doesn't appear to be working properly. Does Betamax allow me to run my HTTP requests without the need for an active server?

All our tools make requests through a single class, which I've subclassed and overrode in our test suite to allow for the use of betamax recording. I'm hoping to monkeypatch the real class with this test class. Here is the test class

import betamax
from marvin.api.api import Interaction

# configure betamax
betamax.Betamax.register_serializer(pretty_json.PrettyJSONSerializer)
with betamax.Betamax.configure() as beta_config:
    beta_config.cassette_library_dir = os.path.join(os.path.dirname(__file__), 'cassettes/')
    beta_config.default_cassette_options['match_requests_on'] = ['method', 'uri', 'body']
    beta_config.default_cassette_options['record_mode'] = 'new_episodes'
    beta_config.default_cassette_options['serialize_with'] = 'prettyjson'
pytest.mark.usefixtures('betamax_session')

class TestInteraction(Interaction):

    def _setup_betamax(self):
       # self.session is a requests.Session created on init
        self.recorder = betamax.Betamax(self.session)

    def _sendRequest(self, request_type):
        ''' sends the api requests to the server '''
        assert request_type in ['get', 'post'], 'Valid request types are "get" and "post".'

        self._setup_betamax()

        # Send the request
        try:
            if request_type == 'get':
                with self.recorder.use_cassette('marvin_interaction'):
                    self._response = self.session.get(self.url, params=self.params, timeout=self.timeout,
                                                      headers=self.headers, stream=self.stream)
            elif request_type == 'post':
                with self.recorder.use_cassette('marvin_interaction'):
                    self._response = self.session.post(self.url, data=self.params, timeout=self.timeout,
                                                       headers=self.headers, stream=self.stream)
        except Exception as e:
            # stuff
        else:
            # Check the response if it's good
            self._checkResponse(self._response)

and an example test + an example tool use of this class

def test_beta(betamax_session):
    path = 'http://localhost:5000' + config.urlmap['api']['getroutemap']['url']
    betamax_session.get(path)

@pytest.fixture(scope='session')
def set_sasurl(loc='local', port=None):
    """Set the sasurl to local or test-utah, and regenerate the urlmap."""
    if not port:
        port = int(os.environ.get('LOCAL_MARVIN_PORT', 5000))
    istest = True if loc == 'utah' else False
    config.switchSasUrl(loc, test=istest, port=port)
    response = TestInteraction('api/general/getroutemap', request_type='get')
    config.urlmap = response.getRouteMap()

When I first run this test with my local test server turned on, everything runs ok and the proper cassettes get generated in my cassette_directory. Everything looks ok. However, when I turn off the local server and rerun the tests, it's failing because my server is offline, and ultimately get a connection error, Requests Connection Error: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url. It seems like it is still trying to send a real request rather than use the cassette file? Am I missing some configuration somewhere? I think I want to do what Betamax offers as a solution but is it really the right tool?

hroncok commented 6 years ago

It is the right tool. I'll look into the provided examples closely later, now I'm only on cellphone. Do you have a full example, for example as an open source project on github?

havok2063 commented 6 years ago

Yes and no. Our software is on Github but it's not runnable as it requires some specific infrastructure setup and auth. But you can at least peruse the code here. Here is our test directory. It's a beast. The above example stuff is in the tests/__init__ and conftest.py.
https://github.com/sdss/marvin/tree/betamax/python/marvin/tests

I'm just trying to hack together a test first to see if Betamax will work for us.

sigmavirus24 commented 6 years ago

Hi @havok2063

The code you have in your original comment is a little confusing to me, but I'm also rather ill right now and my head is foggy. I think it looks like you're always using marvin_interaction as the cassette name. That could be part of your problem there. For each test, you should strive to have a single unique cassette. I think one of our test fixtures try to generate those for the user, so you could probably peruse those if you just want to auto-generate the names.

I'm also unclear as to which tests you want to avoid setting a server up for. It sounds like you have tests that run against your flask application but you also have flask-based tests that make requests that way too. The former are definitely a good candidate for Betamax (i.e., you set up the server locally, make requests in your tests while Betamax records, and then don't require the server unless you need to re-record the cassette). The latter, isn't a fit for Betamax. Flask's unittesting harness shouldn't set up a server to run your requests so you should be fine to continue using that as is (if I remember correctly).

Hopefully that answers some of your questions and @hroncok has a clearer head to help you with the rest.

Thanks for trying out Betamax, Ian

havok2063 commented 6 years ago

@sigmavirus24 thanks for the response.

Yeah our test suite is a bit convoluted. We have a Flask server serving web route, an API running on that Flask server, and we have regular Python code that internally makes remote calls to the server using the API. I think this is the former case you describe. We are using the pytest-flask testing suite to test our web and api routes directly. These work without a server. However, all of our tests testing the python client code right currently require a test server and database so that the remote calls can work. I would like to replace that with Betamax so we can remove the need for the test server and db entirely. I'd like those tests to run off the cassettes. I think that's the only part of our test suite that needs to be on cassette.

I will try to set up the cassette name to be unique to the test running it and see if that works. We do a lot of test setup and parametrization and run a lot of tests. So, if possible, it'd be nice to keep the number of cassette files to a minimum.

havok2063 commented 6 years ago

But this test example doesn't use our TestInteraction class and still does not work without the server running. It uses the built-in betamax_session fixture. When I ran this the first time, with the server, it created a marvin.tests.conftest.test_beta.json cassette file. But yet I still get a ConnectionError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url error. Full traceback is below.

def test_beta(betamax_session):
    path = 'http://localhost:5000' + config.urlmap['api']['getroutemap']['url']
    betamax_session.get(path)

Traceback

―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― test_beta ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

betamax_session = <requests.sessions.Session object at 0x10e422e50>

    def test_beta(betamax_session):
        path = 'http://localhost:5000' + config.urlmap['api']['getroutemap']['url']
>       betamax_session.get(path)

conftest.py:72:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/Users/Brian/anaconda2/lib/python2.7/site-packages/requests/sessions.py:521: in get
    return self.request('GET', url, **kwargs)
/Users/Brian/anaconda2/lib/python2.7/site-packages/requests/sessions.py:508: in request
    resp = self.send(prep, **send_kwargs)
/Users/Brian/anaconda2/lib/python2.7/site-packages/raven/breadcrumbs.py:297: in send
    resp = real_send(self, request, *args, **kwargs)
/Users/Brian/anaconda2/lib/python2.7/site-packages/requests/sessions.py:618: in send
    r = adapter.send(request, **kwargs)
/Users/Brian/anaconda2/lib/python2.7/site-packages/betamax/adapter.py:127: in send
    request, stream, timeout, verify, cert, proxies
/Users/Brian/anaconda2/lib/python2.7/site-packages/betamax/adapter.py:157: in send_and_record
    cert=cert, proxies=proxies
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <requests.adapters.HTTPAdapter object at 0x10e4e0650>, request = <PreparedRequest [GET]>, stream = True
timeout = <urllib3.util.timeout.Timeout object at 0x12068e4d0>, verify = True, cert = None, proxies = OrderedDict()

    def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
        """Sends PreparedRequest object. Returns Response object.

            :param request: The :class:`PreparedRequest <PreparedRequest>` being sent.
            :param stream: (optional) Whether to stream the request content.
            :param timeout: (optional) How long to wait for the server to send
                data before giving up, as a float, or a :ref:`(connect timeout,
                read timeout) <timeouts>` tuple.
            :type timeout: float or tuple or urllib3 Timeout object
            :param verify: (optional) Either a boolean, in which case it controls whether
                we verify the server's TLS certificate, or a string, in which case it
                must be a path to a CA bundle to use
            :param cert: (optional) Any user-provided SSL certificate to be trusted.
            :param proxies: (optional) The proxies dictionary to apply to the request.
            :rtype: requests.Response
            """

        conn = self.get_connection(request.url, proxies)

        self.cert_verify(conn, request.url, verify, cert)
        url = self.request_url(request, proxies)
        self.add_headers(request)

        chunked = not (request.body is None or 'Content-Length' in request.headers)

        if isinstance(timeout, tuple):
            try:
                connect, read = timeout
                timeout = TimeoutSauce(connect=connect, read=read)
            except ValueError as e:
                # this may raise a string formatting error.
                err = ("Invalid timeout {0}. Pass a (connect, read) "
                       "timeout tuple, or a single float to set "
                       "both timeouts to the same value".format(timeout))
                raise ValueError(err)
        elif isinstance(timeout, TimeoutSauce):
            pass
        else:
            timeout = TimeoutSauce(connect=timeout, read=timeout)

        try:
            if not chunked:
                resp = conn.urlopen(
                    method=request.method,
                    url=url,
                    body=request.body,
                    headers=request.headers,
                    redirect=False,
                    assert_same_host=False,
                    preload_content=False,
                    decode_content=False,
                    retries=self.max_retries,
                    timeout=timeout
                )

            # Send the request.
            else:
                if hasattr(conn, 'proxy_pool'):
                    conn = conn.proxy_pool

                low_conn = conn._get_conn(timeout=DEFAULT_POOL_TIMEOUT)

                try:
                    low_conn.putrequest(request.method,
                                        url,
                                        skip_accept_encoding=True)

                    for header, value in request.headers.items():
                        low_conn.putheader(header, value)

                    low_conn.endheaders()

                    for i in request.body:
                        low_conn.send(hex(len(i))[2:].encode('utf-8'))
                        low_conn.send(b'\r\n')
                        low_conn.send(i)
                        low_conn.send(b'\r\n')
                    low_conn.send(b'0\r\n\r\n')

                    # Receive the response from the server
                    try:
                        # For Python 2.7+ versions, use buffering of HTTP
                        # responses
                        r = low_conn.getresponse(buffering=True)
                    except TypeError:
                        # For compatibility with Python 2.6 versions and back
                        r = low_conn.getresponse()

                    resp = HTTPResponse.from_httplib(
                        r,
                        pool=conn,
                        connection=low_conn,
                        preload_content=False,
                        decode_content=False
                    )
                except:
                    # If we hit any problems here, clean up the connection.
                    # Then, reraise so that we can handle the actual exception.
                    low_conn.close()
                    raise

        except (ProtocolError, socket.error) as err:
            raise ConnectionError(err, request=request)

        except MaxRetryError as e:
            if isinstance(e.reason, ConnectTimeoutError):
                # TODO: Remove this in 3.0.0: see #2811
                if not isinstance(e.reason, NewConnectionError):
                    raise ConnectTimeout(e, request=request)

            if isinstance(e.reason, ResponseError):
                raise RetryError(e, request=request)

            if isinstance(e.reason, _ProxyError):
                raise ProxyError(e, request=request)

            if isinstance(e.reason, _SSLError):
                # This branch is for urllib3 v1.22 and later.
                raise SSLError(e, request=request)

>           raise ConnectionError(e, request=request)
E           ConnectionError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /marvin2/api/general/getroutemap/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10e3da290>: Failed to establish a new connection: [Errno 61] Connection refused',))

/Users/Brian/anaconda2/lib/python2.7/site-packages/requests/adapters.py:508: ConnectionError
hroncok commented 6 years ago

Why is raven in the traceback? What happens if you uninstall it?

hroncok commented 6 years ago

Raven hooks up into requests.Session and changes the send method:

https://github.com/getsentry/raven-python/blob/9c72270099a283b2f74a0bca3ae8d09aaa161895/raven/breadcrumbs.py#L308

havok2063 commented 6 years ago

Because we use raven (Sentry) for logging all errors. However, this should be disabled in our test suite. I can look into that separately. Here is the traceback with it uninstalled. No change.

―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― test_beta ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

request = <FixtureRequest for <Function 'test_beta'>>, betamax_session = <requests.sessions.Session object at 0x10d01ba90>

    def test_beta(request, betamax_session):
        path = 'http://localhost:5000' + config.urlmap['api']['getroutemap']['url']
>       betamax_session.get(path)

conftest.py:73:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/Users/Brian/anaconda2/lib/python2.7/site-packages/requests/sessions.py:521: in get
    return self.request('GET', url, **kwargs)
/Users/Brian/anaconda2/lib/python2.7/site-packages/requests/sessions.py:508: in request
    resp = self.send(prep, **send_kwargs)
/Users/Brian/anaconda2/lib/python2.7/site-packages/requests/sessions.py:618: in send
    r = adapter.send(request, **kwargs)
/Users/Brian/anaconda2/lib/python2.7/site-packages/betamax/adapter.py:127: in send
    request, stream, timeout, verify, cert, proxies
/Users/Brian/anaconda2/lib/python2.7/site-packages/betamax/adapter.py:157: in send_and_record
    cert=cert, proxies=proxies
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <requests.adapters.HTTPAdapter object at 0x10d01bb10>, request = <PreparedRequest [GET]>, stream = True
timeout = <urllib3.util.timeout.Timeout object at 0x1200b2210>, verify = True, cert = None, proxies = OrderedDict()

    def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
        """Sends PreparedRequest object. Returns Response object.

            :param request: The :class:`PreparedRequest <PreparedRequest>` being sent.
            :param stream: (optional) Whether to stream the request content.
            :param timeout: (optional) How long to wait for the server to send
                data before giving up, as a float, or a :ref:`(connect timeout,
                read timeout) <timeouts>` tuple.
            :type timeout: float or tuple or urllib3 Timeout object
            :param verify: (optional) Either a boolean, in which case it controls whether
                we verify the server's TLS certificate, or a string, in which case it
                must be a path to a CA bundle to use
            :param cert: (optional) Any user-provided SSL certificate to be trusted.
            :param proxies: (optional) The proxies dictionary to apply to the request.
            :rtype: requests.Response
            """

        conn = self.get_connection(request.url, proxies)

        self.cert_verify(conn, request.url, verify, cert)
        url = self.request_url(request, proxies)
        self.add_headers(request)

        chunked = not (request.body is None or 'Content-Length' in request.headers)

        if isinstance(timeout, tuple):
            try:
                connect, read = timeout
                timeout = TimeoutSauce(connect=connect, read=read)
            except ValueError as e:
                # this may raise a string formatting error.
                err = ("Invalid timeout {0}. Pass a (connect, read) "
                       "timeout tuple, or a single float to set "
                       "both timeouts to the same value".format(timeout))
                raise ValueError(err)
        elif isinstance(timeout, TimeoutSauce):
            pass
        else:
            timeout = TimeoutSauce(connect=timeout, read=timeout)

        try:
            if not chunked:
                resp = conn.urlopen(
                    method=request.method,
                    url=url,
                    body=request.body,
                    headers=request.headers,
                    redirect=False,
                    assert_same_host=False,
                    preload_content=False,
                    decode_content=False,
                    retries=self.max_retries,
                    timeout=timeout
                )

            # Send the request.
            else:
                if hasattr(conn, 'proxy_pool'):
                    conn = conn.proxy_pool

                low_conn = conn._get_conn(timeout=DEFAULT_POOL_TIMEOUT)

                try:
                    low_conn.putrequest(request.method,
                                        url,
                                        skip_accept_encoding=True)

                    for header, value in request.headers.items():
                        low_conn.putheader(header, value)

                    low_conn.endheaders()

                    for i in request.body:
                        low_conn.send(hex(len(i))[2:].encode('utf-8'))
                        low_conn.send(b'\r\n')
                        low_conn.send(i)
                        low_conn.send(b'\r\n')
                    low_conn.send(b'0\r\n\r\n')

                    # Receive the response from the server
                    try:
                        # For Python 2.7+ versions, use buffering of HTTP
                        # responses
                        r = low_conn.getresponse(buffering=True)
                    except TypeError:
                        # For compatibility with Python 2.6 versions and back
                        r = low_conn.getresponse()

                    resp = HTTPResponse.from_httplib(
                        r,
                        pool=conn,
                        connection=low_conn,
                        preload_content=False,
                        decode_content=False
                    )
                except:
                    # If we hit any problems here, clean up the connection.
                    # Then, reraise so that we can handle the actual exception.
                    low_conn.close()
                    raise

        except (ProtocolError, socket.error) as err:
            raise ConnectionError(err, request=request)

        except MaxRetryError as e:
            if isinstance(e.reason, ConnectTimeoutError):
                # TODO: Remove this in 3.0.0: see #2811
                if not isinstance(e.reason, NewConnectionError):
                    raise ConnectTimeout(e, request=request)

            if isinstance(e.reason, ResponseError):
                raise RetryError(e, request=request)

            if isinstance(e.reason, _ProxyError):
                raise ProxyError(e, request=request)

            if isinstance(e.reason, _SSLError):
                # This branch is for urllib3 v1.22 and later.
                raise SSLError(e, request=request)

>           raise ConnectionError(e, request=request)
E           ConnectionError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /marvin2/api/general/getroutemap/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10d03b050>: Failed to establish a new connection: [Errno 61] Connection refused',))

/Users/Brian/anaconda2/lib/python2.7/site-packages/requests/adapters.py:508: ConnectionError
hroncok commented 6 years ago

Ok.

What happens if you change the record mode to none? http://betamax.readthedocs.io/en/latest/record_modes.html#none

havok2063 commented 6 years ago

So that worked. I'm not 100% sure why. I thought a record mode of new_episodes only try to record when it finds a request that does not match an existing cassette file? I assumed since I already have a marvin.tests.conftest.test_beta.json cassette file, that it would not try to record something. Is that not the case?

I am using a match_requests_on of ['method', 'uri', 'body']. Maybe that's my issue.

sigmavirus24 commented 6 years ago

It is plausible that new_episodes has a bug.

havok2063 commented 6 years ago

Quite possibly. Does the cassette matching include the timestamp when looking for similar tapes? That would make the same tape file appear different. Getting new_episodes to work would be great. Otherwise I'll need to switch between new_episodes and none when off/on Travis.

sigmavirus24 commented 6 years ago

I would strongly suggest switching between those modes. It's something my projects do:

Briefly looking at how we find matching interactions I think the only real special-casing we have is here. Finally, our Adapter contains the logic as to whether or not to attempt to retrieve a new request here. If none works, but new_episodes doesn't, I wonder what other pytest settings you might be using that might be affecting this. I've never seen it otherwise.

havok2063 commented 6 years ago

Yeah, that seems like what I'll have to do then. It's an easy enough solution for now. Looking through your links I don't find anything odd about the matching. I did some more tests and the bug might be an issue with recording tests from a localhost server. Here is a vanilla test setup that doesn't use any of our convoluted setup in our main software.

import pytest
import betamax
from betamax_serializers import pretty_json
from marvin import config
import os

# Configure Betamax
# -------------------
betamax.Betamax.register_serializer(pretty_json.PrettyJSONSerializer)
with betamax.Betamax.configure() as beta_config:
    beta_config.cassette_library_dir = os.path.realpath(os.path.join(os.path.dirname(__file__), '../tests/cassettes/'))
    beta_config.default_cassette_options['match_requests_on'] = ['method', 'uri', 'body']
    beta_config.default_cassette_options['record_mode'] = 'new_episodes'
    beta_config.default_cassette_options['serialize_with'] = 'prettyjson'
pytest.mark.usefixtures('betamax_session')

def test_beta(request, betamax_session):
    # test local server of main software
    #path = 'http://localhost:5000' + config.urlmap['api']['getroutemap']['url']

    # test http url
    #path = 'http://httpbin.org/ip'

    # test completely separate local webapp 
    path = 'http://localhost:5000/fully/'

    betamax_session.get(path)

I ran this test each with the different paths you see. I tested one path from the local server running our main code, one path from httpbin, an online test request server, and one path from a separate local basic webapp I have. For each test I deleted any cassette file, started the servers, ran the test, which passed and generated the cassette file. I then shut off the server and reran the test. The httpbin path passed, but the two localhost paths failed with an HTTPConnectionPoolError. So it seems there's something about localhost uris that break the intended actions.

sigmavirus24 commented 6 years ago

That's truly bizarre, but I'm not entirely surprised as I don't think I tested a localhost case very stringently. It does give me something to investigate though. Thank you :)

sigmavirus24 commented 6 years ago

I wonder if you can share one of your localhost cassettes that I could visually inspect.

havok2063 commented 6 years ago

Yeah, no problem. That's what I'm here for. :) Sure. Here is a small one from a local test webapp. Let me know if I can do anything else.

{
  "http_interactions": [
    {
      "recorded_at": "2018-04-03T15:39:36",
      "request": {
        "body": {
          "encoding": "utf-8",
          "string": ""
        },
        "headers": {
          "Accept": [
            "*/*"
          ],
          "Accept-Encoding": [
            "gzip, deflate"
          ],
          "Connection": [
            "keep-alive"
          ],
          "User-Agent": [
            "python-requests/2.18.4"
          ]
        },
        "method": "GET",
        "uri": "http://localhost:5000/fully/"
      },
      "response": {
        "body": {
          "encoding": null,
          "string": "{\n  \"endpoint\": \"Fully:get\", \n  \"full\": \"/fully/?\", \n  \"msg\": \"this is get\", \n  \"path\": \"/fully/\", \n  \"url\": \"http://localhost:5000/fully/\"\n}\n"
        },
        "headers": {
          "Content-Length": [
            "142"
          ],
          "Content-Type": [
            "application/json"
          ],
          "Date": [
            "Tue, 03 Apr 2018 15:39:36 GMT"
          ],
          "Server": [
            "Werkzeug/0.14.1 Python/2.7.12"
          ]
        },
        "status": {
          "code": 200,
          "message": "OK"
        },
        "url": "http://localhost:5000/fully/"
      }
    }
  ],
  "recorded_with": "betamax/0.8.1"
}
sigmavirus24 commented 6 years ago

So I can absolutely reproduce this locally against localhost

sigmavirus24 commented 6 years ago

Okay, so new_episodes will make Cassette.is_recording() return True always. And if that's true and the record mode isn't new_episodes then we short-circuit and don't even look for the recorded interaction.

havok2063 commented 6 years ago

Great! Thanks for fixing this!