betamaxpy / betamax

A VCR imitation designed only for python-requests.
https://betamax.readthedocs.io/en/latest/
Other
562 stars 61 forks source link

tests fails on follow up runs #211

Open ReenigneArcher opened 2 months ago

ReenigneArcher commented 2 months ago

I have the below code to test a praw reddit bot. The first test always succeeds, but follow up tests fail to find the required calls in the cassette. I discovered that it is related to the filter_access_token callback, and if I disable that, then follow up tests will succeed. Am I doing something wrong in the callback function?

from base64 import b64encode
import json
import os
from urllib.parse import quote_plus

from betamax import Betamax, cassette
from betamax_serializers.pretty_json import PrettyJSONSerializer
from praw.config import _NotSet
import pytest
from praw.models import Submission, Comment

from src.reddit.bot import Bot

Betamax.register_serializer(PrettyJSONSerializer)

def b64_string(input_string: str) -> str:
    """Return a base64 encoded string (not bytes) from input_string."""
    return b64encode(input_string.encode('utf-8')).decode('utf-8')

def filter_access_token(
        interaction: cassette.cassette.Interaction,
        current_cassette: cassette.cassette.Cassette,
):
    """Add Betamax placeholder to filter access token."""
    request_uri = interaction.data['request']['uri']
    response = interaction.data['response']

    # We only care about requests that generate an access token.
    if ('api/v1/access_token' not in request_uri or
            response['status']['code'] != 200):
        return
    body = response['body']['string']

    token_types = [
        'access_token',
        'refresh_token',
    ]
    for token_type in token_types:
        try:
            token = json.loads(body)[token_type]
        except (KeyError, TypeError, ValueError):
            continue

        # If we succeeded in finding the token, add it to the placeholders for this cassette.
        current_cassette.placeholders.append(
            cassette.cassette.Placeholder(placeholder=f'<{token_type.upper()}>', replace=token)
        )

def get_placeholders(bot: Bot) -> dict[str, str]:
    """Prepare placeholders for sensitive information."""
    filter_keys = [
        'client_id',
        'client_secret',
        'password',
        'username',
    ]

    placeholders = {
        attr: getattr(bot.reddit.config, attr)
        for attr in filter_keys}

    # Check if attributes exist and are not _NotSet before using them
    for key in placeholders:
        if isinstance(placeholders[key], _NotSet):
            placeholders[key] = ''

    # Password is sent URL-encoded.
    placeholders['password'] = quote_plus(placeholders['password'])

    # Client ID and secret are sent in base-64 encoding.
    placeholders['basic_auth'] = b64_string(
        "{}:{}".format(placeholders['client_id'],
                       placeholders['client_secret'])
    )

    return placeholders

class TestBot:
    @pytest.fixture(scope='session')
    def bot(self):
        return Bot(
            user_agent='Test suite',
        )

    @pytest.fixture(scope='session', autouse=True)
    def betamax_config(self, bot):
        with Betamax.configure() as config:
            config.cassette_library_dir = 'tests/fixtures/cassettes'
            config.default_cassette_options['serialize_with'] = 'prettyjson'
            config.before_record(callback=filter_access_token)

            # Add placeholders for sensitive information.
            for key, value in get_placeholders(bot).items():
                config.define_cassette_placeholder('<{}>'.format(key.upper()), value)

    @pytest.fixture(scope='session')
    def session(self, bot):
        http = bot.reddit._core._requestor._http
        http.headers['Accept-Encoding'] = 'identity'  # ensure response is human readable
        return http

    @pytest.fixture(scope='session')
    def recorder(self, session):
        return Betamax(session)

    def test_submission(self, bot, recorder, request):
        submission = bot.reddit.submission(id='w03cku')
        with recorder.use_cassette(request.node.name):
            assert submission.author

I get the following error.

self = <prawcore.requestor.Requestor object at 0x00000170BD94D8D0>, timeout = None, args = ('post', 'https://www.reddit.com/api/v1/access_token')
kwargs = {'auth': ('**********************', '******************************'), 'data': [('grant_type', 'password'), ('password', '****************'), ('username', '**************')], 'headers': {'Connection': 'close'}}

    def request(
        self, *args: Any, timeout: float | None = None, **kwargs: Any
    ) -> Response:
        """Issue the HTTP request capturing any errors that may occur."""
        try:
            return self._http.request(*args, timeout=timeout or self.timeout, **kwargs)
        except Exception as exc:  # noqa: BLE001
>           raise RequestException(exc, args, kwargs) from None
E           prawcore.exceptions.RequestException: error with request A request was made that could not be handled.
E
E           A request was made to https://www.reddit.com/api/v1/access_token that could not be found in test_submission.
E
E           The settings on the cassette are:
E
E               - record_mode: once
E               - match_options {'uri', 'method'}.

venv\Lib\site-packages\prawcore\requestor.py:70: RequestException
--------------------------------------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------------------------------------- 
WARNING  prawcore:sessions.py:161 Retrying due to ChunkedEncodingError(ProtocolError('Connection broken: IncompleteRead(93 bytes read, 755 more expected)', IncompleteRead(93 bytes read, 755 more expected))) status: GET https://oauth.reddit.com/comments/w03cku/

I've been trying to follow this (somewhat outdated) blog/tutorial (https://leviroth.github.io/2017-05-16-testing-reddit-bots-with-betamax/) on using betamax and I believe this is the last issue I have to solve.

Edit: I did find that I can use

cassette.cassette.Placeholder(placeholder='*' * len(token), replace=token)

instead of

cassette.cassette.Placeholder(placeholder=f'<{token_type.upper()}>', replace=token)

and it will work... but I'm not sure that's the best solution.

sigmavirus24 commented 2 months ago
    def test_submission(self, bot, recorder, request):
        submission = bot.reddit.submission(id='w03cku')
        with recorder.use_cassette(request.node.name):
            assert submission.author

Seems to me like it doesn't perform a request while betamax is recording but I don't know your bot code

ReenigneArcher commented 2 months ago

praw creates lazy objects and doesn't make an api request until you access a property.

The issue is with the 'Content-Length' header mismatch, as the placeholder is shorter in length than the original value.

I can patch the content-length manually before playback... but I wonder if there's a better way.

sigmavirus24 commented 2 months ago

You likely need the preserve exact body bytes option which I don't see you setting here explicitly and which is documented in cases like this

ReenigneArcher commented 2 months ago

Well, I need to strip out an access token from the body... my understanding (based on the docs) is the body would remain unchanged with that option, thus exposing my access token.

ReenigneArcher commented 2 months ago

This ended up working as a before playback callback... ideally I would like to do this before record, but I couldn't get it to work.

def patch_content_length(
    interaction: cassette.cassette.Interaction,
    current_cassette: cassette.cassette.Cassette,
):
    """Fix the Content-Length header in the response after sanitizing tokens."""
    request_uri = interaction.data['request']['uri']
    response = interaction.data['response']

    # # We only care about requests that generate an access token.
    if ('api/v1/access_token' not in request_uri or
            response['status']['code'] != 200):
        return
    body = response['body']['string']
    content_length = len(body.encode('utf-8'))
    response['headers']['Content-Length'] = [str(content_length)]
sigmavirus24 commented 2 months ago

You're other option is that since you're using identity for accept encoding you ensure that whatever you use as a placeholder is the right length so whatever you replace it with is the same length otherwise you'll encounter this for everything.

sigmavirus24 commented 2 months ago

https://betamax.readthedocs.io/en/latest/configuring.html#filtering-sensitive-data describes before record hook which happens after sanitization

ReenigneArcher commented 2 months ago

Right... the problem being that when the length of the placeholder is different than the original length for what you want to replace... the "Content-Length" header no longer matches... and follow up tests will fail due to the IncompleteRead error, since it's expecting more bytes than it actually receives.

sigmavirus24 commented 2 months ago

Right, there might be a way to disable that in urllib3/http.client but I barely have the time right now to answer this issue. I could review a PR to do that but not sure when I'd have a chance to kick off a release