Open kfcaio opened 2 years ago
Can you try setting preserve_exact_body_bytes=True
on your config? https://betamax.readthedocs.io/en/latest/api.html?highlight=bytes#forcing-bytes-to-be-preserved I wonder if we need a heuristic around Content-Type: application/zip
Thank you for your quick response. It worked, but no http interactions were recorded using BETAMAX_RECORD_MODE=all
{"http_interactions": [], "recorded_with": "betamax/0.8.1"}
Is it expected?
No but all is not generally advisable. Why are you using all?
@sigmavirus24 my bad, I was creating a new session somewhere in my actual script. It worked as expected, thank you! I think you may close this one
Would you want to add a heuristic via PR for that content-type to automatically preserve the exact body bytes? I think that is a reasonable feature request and PR and should be small-ish in effort
Sure : )
If it helps to get started, https://github.com/betamaxpy/betamax/blob/2c12cee59ac365f39497a3718eed04ab9c6ce988/src/betamax/util.py#L58-L59 is where I'm thinking we need a change. I suspect, however, that we want to keep that from becoming too complicated to read, so if you want to make the condition a separate function I'm :+1: on that.
@sigmavirus24 I wrote a test for one function that downloads a large zip file using requests module. I've found discrepancy in Content-Length when comparing test execution with betamax and without it. Using Betamax, the length of the binary string extracted is way larger. Besides that, I need to pass that binary string to BytesIO and then to
zipfile.ZipFile
, but gotzipfile.BadZipFile: Bad magic number for central directory
exception.My test setup:
I pass the
self.session
to function under test and use it to get a endpoint. Through that endpoint, I get the zip file in the form of bytes string (response.content
). I found that test runs without errors if I don't use the Betamax session.Test
Session headers
{'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
Request headers
{'Accept-Ranges': 'bytes', 'ETag': 'W/"159406-1633990217000"', 'Last-Modified': 'Mon, 11 Oct 2021 22:10:17 GMT', 'Content-Type': 'application/zip', 'Content-Length': '159406', 'Date': 'Thu, 21 Oct 2021 14:37:27 GMT', 'Set-Cookie': 'BIGipServerpool_wserv=973081866.20480.0000; path=/; Httponly, TS01dc523b=016a5b383346ca02628a7c1dd47ef26e8cadf4a1b22fa9261c6b9ac1de8ac5665e99bd4a42c5b1d0af72b97105f57020b5e0f78fa7452df6080bf5ea3ee7a85d2de98968a2; Path=/; Domain=.www.stj.jus.br', 'Strict-Transport-Security': 'max-age=604800; includeSubDomains', 'Content-Security-Policy': "upgrade-insecure-requests; frame-ancestors 'self' https://*.stj.jus.br https://*.web.stj.jus.br https://stjjus.sharepoint.com/"}
Actual content length
len(response.content) == 288055
Script execution
Session headers
{'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
Request headers
{'Accept-Ranges': 'bytes', 'ETag': 'W/"159406-1633990217000"', 'Last-Modified': 'Mon, 11 Oct 2021 22:10:17 GMT', 'Content-Type': 'application/zip', 'Content-Length': '159406', 'Date': 'Thu, 21 Oct 2021 14:39:24 GMT', 'Set-Cookie': 'BIGipServerpool_wserv=973081866.20480.0000; path=/; Httponly, TS01dc523b=016a5b3833746a54a2d1276a2b3de87f48f672e9cd7c18c4dad842ddddeac244bcbcf1a470b59eecf83bd6a3bdeffc7c7017210981de929d01df6c054118625399d2b04ad2; Path=/; Domain=.www.stj.jus.br', 'Strict-Transport-Security': 'max-age=604800; includeSubDomains', 'Content-Security-Policy': "upgrade-insecure-requests; frame-ancestors 'self' https://*.stj.jus.br https://*.web.stj.jus.br https://stjjus.sharepoint.com/"}
Actual content length
len(response.content) == 159406
I'm using Python 3.8.2, Betamax 0.8.1, Pytest 5.4.1 to run test and Requests 2.25.1
Related question: https://stackoverflow.com/questions/69653406/how-to-mock-a-function-that-downloads-a-large-binary-content-using-betamax
Related issue: https://github.com/betamaxpy/betamax/issues/122