bblimke / webmock

Library for stubbing and setting expectations on HTTP requests in Ruby.
MIT License
3.97k stars 555 forks source link

Force encoding to utf-8 breaks multipart file data in ruby 2.0 #333

Open eric-hawkins opened 10 years ago

eric-hawkins commented 10 years ago

When WebMock::Util::QueryMapper.query_to_values is used on the body of a POST with content-type: multipart-form-data including file data encoded as ASCII-8bit (e.g. image data), you can get invalid UTF-8 characters and encoding raises exceptions.

I hit this mocking a RestClient POST containing image data.

bblimke commented 10 years ago

Can you provide a spec or sample code please?

eric-hawkins commented 10 years ago

Here you go:

require 'webmock'
require 'rest-client'

url = 'http://someplace.com/upload'
file = File.new('/path/to/some/image.jpg', 'rb')
params = {'file' => file}
sample_post = WebMock::API.stub_request(:post, url).with(:body => params)
RestClient.post(url, params)
assert_requested sample_post
bblimke commented 10 years ago

Thanks. This is due to WebMock incorrectly trying to parse multipart body as url encoded body here https://github.com/bblimke/webmock/blob/master/lib/webmock/request_pattern.rb#L204 Maybe it should return raw body if there is an exception parsing.

Anyway, the hash {'file' => file} is a structure supported only by RestClient internally. Only RestClient knows how to convert such a hash to a multipart body. It's not any convention or standard in Ruby. WebMock won't support it, even if the parsing issue is solved.

eric-hawkins commented 10 years ago

Yes, I think returning the raw body if there is an exception parsing could work, or you could add support for multipart form data directly.

I don't think the fact that I'm using RestClient is relevant to the issue. That is just a figment of how I am triggering the error in WebMock. By the time it hits WebMock the body is a string containing all the multipart headers and the file data. WebMock never sees the hash {'file' => file}.

eric-hawkins commented 10 years ago

Sorry, now I see why you mentioned the RestClient problem with respect to my example. The example I provided was too quick and dirty, only intended to generate the encoding exception. I understand the WebMock matcher wouldn't work with the provided body. In the real test which originally led us to this issue, we provide a block and use the multipart strings directly.

bblimke commented 10 years ago

Thanks for clarifying the RestClient params intention.

It may not be so trivial to add multipart support to WebMock, but returning raw body in case url parser fails should be a quick one.

I'll implement that when I get a minute, unless I receive some pull request earlier.

jimherz commented 10 years ago

Perhaps a simpler solution than attempting to catch an encoding error in WebMock::Util::QueryMapper.query_to_values would be to just return the raw body when content_type is "multipart/form-data" in WebMock::BodyPattern.body_as_hash, i.e.:


module WebMock

  ...

  module BodyPattern

    ...

    BODY_FORMATS = {
      ...
      'multipart/form-data'    => :multipart
    }

    ...

    def body_as_hash(body, content_type)
      puts body
      puts content_type
      case BODY_FORMATS[content_type]
      when :json then
        WebMock::Util::JSON.parse(body)
      when :xml then
        Crack::XML.parse(body)
      when :multipart then
        body
      else
        WebMock::Util::QueryMapper.query_to_values(body)
      end
    end

    ...

  end

end

This has the added advantage of making it explicit in the code that WebMock does not support parsing multipart form data. If desired, WebMock users can then parse and recognize the multipart data themselves in a with block (this is what Eric and my team are doing).

If you like this idea, I'd be happy to submit a pull request.

bblimke commented 10 years ago

@jimherz Thank you for suggesting this idea. The problem is that the issue is not related to multipart only. It's related to WebMock behaviour, when a request stub is created with a body declared as a hash.

In current implementation WebMock makes an incorrect assumption, that if the stub is declared with hash body, then request body can be represented as a hash too. The error arises due to WebMock not having any other option, as a last chance trying to parse body as url encoded, but that fails too.

It tries to parse request body as a hash, but body can't be represented as a hash. That deserves an exception imho. body_as_hash received a body argument that is not parsable to a hash. It's at least an ArgumentError. Imho body_as_hash should raise BodyCantBeParsedToHashError and #matches? should catch that error and treat body as a non-hash body.

Expecting method called #body_as_hash to return a string, may be confusing too.

jimherz commented 10 years ago

Thanks...I see...that all makes sense...my knowledge of the internals of webmock is limited to today, so I'm not surprised my idea is off the mark :).

For the time being, we will patch WebMock::BodyPattern.matches? in our test code to always return false if the content_type is 'multipart/form-data'. This will work for our one test case that uses multipart data. The matching request stub has no body pattern and does all matching in a with block.

eric-hawkins commented 10 years ago

Any update on this?

bblimke commented 10 years ago

@eric-hawkins no update :( I'm happy to accept a pull request, but I don't have too much time to work on WebMock now.

guigs commented 5 years ago

I'm using this as a workaround: https://github.com/bblimke/webmock/issues/623#issuecomment-536603971