betamaxpy / betamax

A VCR imitation designed only for python-requests.
https://betamax.readthedocs.io/en/latest/
Other
567 stars 62 forks source link

Requests containing files cannot be replayed. #64

Closed bboe closed 9 years ago

bboe commented 9 years ago

When specifying files via files=[...] requests results in choosing a different multipart_boundary each time, thus making it quite difficult to match. There also appear to be some issues with cassettes and binary data for body matching.

Perhaps when using the betamax adapter, the multipart_boundary can be fixed to a constant value rather than a uuid that requests uses.

sigmavirus24 commented 9 years ago

So there's going to be a severe layers violation here if we try to do that. The preparation of that multipart body happens well before the betamax adapter sees it. That stuff happens before we even get the data.

There's no way we as a library can do this and I'm not certain what the best fix is. As I see it, we have two options:

  1. Users could use the MultipartEncoder which allows users to stream large uploads & to select their own boundary.
  2. In the betamax-matchers package, we could use the undocumented MultipartDecoder to parse the body and do matching that way.

Personally I'm leaning towards 2. Thoughts @bboe?

sigmavirus24 commented 9 years ago

Note: Currently the MultipartDecoder is designed to work with responses, but it can just as easily work with request objects.

bboe commented 9 years ago

I suppose #2 is the way to go if we cannot change the multipart boundary. My other option would be to get requests to accept specifying the multipart boundary (urlib3 does, but not requests) so that I can set it static (there's not reason to change the boundary per request).

I started down this former path (not actual decoding though) and ran into another set of encoding errors between versions of python as the request in memory is in binary format.

sigmavirus24 commented 9 years ago

My other option would be to get requests to accept specifying the multipart boundary (urlib3 does, but not requests) so that I can set it static (there's not reason to change the boundary per request).

So this isn't something people frequently ask for. The "blessed" way of doing that is with the MultipartEncoder from the toolbelt. I think @lukasa would agree that there isn't a good API that we can think of that would provide that functionality in requests.

I started down this former path (not actual decoding though) and ran into another set of encoding errors between versions of python as the request in memory is in binary format.

You mean the request stored by Betamax or something else? I suspect that the MultipartDecoder will be able to handle this. I haven't used it extensively, but it appeared to be very robust and well built.

Since this seems amenable, I'll prioritize this. I'm running the PSF's elections next week so I may not be able to attend to this until after I kick those off. By the @bboe, if you're not already a contributing member of the PSF, you should become one. It's free!

sigmavirus24 commented 9 years ago

@bboe I started working on this in sigmavirus24/betamax_matchers@6155015

bboe commented 9 years ago

Awesome! Thanks @sigmavirus24

sigmavirus24 commented 9 years ago

So I have a question for you @bboe:

If you take a look at sigmavirus24/betamax_matchers@2199dea, you'll see that a first implementation of this makes it fairly... naïve. Can you point me to how you're using files= in requests? I think I'm going to make it sort the parts anyway, but that will have to happen by each part's headers which should be the same for parts in semantically equivalent multipart/form-data requests.

I haven't tested any of this, but if you can provide early feedback that'd be awesome.

bboe commented 9 years ago

It's a little complicated but here goes.

The following are the tests that I would like to run. They are commented out currently because they are not replayable. Also the test methods should be decorated with betamax, rather than calling betamax_init inside the tests: https://github.com/praw-dev/praw/blob/master/tests/test_images.py#L86

PRAW uses files by preparing a request associated with a dummy session: https://github.com/praw-dev/praw/blob/f85fbba050d08bc1b0102979a9b3c91805878a79/praw/internal.py#L160

This prepared request can be used in one of two ways. Most commonly it is sent to a rate limit handler that ensures requests are only made within the API's time limits: https://github.com/praw-dev/praw/blob/f85fbba050d08bc1b0102979a9b3c91805878a79/praw/handlers.py#L101

The second way is to permit distributed rate limit handling. In that case, the prepared request is serialized and transmitted to another process that handles the request:

I hope that helps some. I really appreciate you looking into it. I should have some time this weekend to be of any assistance.

sigmavirus24 commented 9 years ago

So looking at the upload_image handler I think I should be able to solve your problem by sorting the parts. I'll take a much closer look tonight though to be safe.

sigmavirus24 commented 9 years ago

Oh and speaking of a betamax decorator, we've had a similar request here: https://github.com/sigmavirus24/betamax/issues/61

sigmavirus24 commented 9 years ago

To make your testing easier @bboe I have released betamax_matchers 0.2.0 (https://github.com/sigmavirus24/betamax_matchers#multipart-form-data-body-matcher). Please file bugs there if you find problems with the matcher.

:beer: Cheers!

bboe commented 9 years ago

Thank you! I will look at it this weekend. I appreciate your efforts :+1: