etr / libhttpserver

C++ library for creating an embedded Rest HTTP server (and more)
GNU Lesser General Public License v2.1
893 stars 186 forks source link

[FEATURE-REQUEST] multiple args with the same key are not handled correctly #292

Closed stuart-byma closed 1 year ago

stuart-byma commented 1 year ago

Prerequisites

Description

A query arg string can have multiple arguments with the same key, e.g. hello.com/what?test=one&test=two. libhttpserver only ever returns the last arg via get_args(). This is inconsistent with MHD, which will iterate multiple args with the same key. get_args() simply overwrites the last one.

This is also inconsistent with other http library, such as Go http package, which will return a collection of strings for a given argument key.

Steps to Reproduce

Expected behavior: get_args() returns a map contains all the values associated to one key, even if there are multiple.

Actual behavior: get_args() overwrites and always returns the last value for given key.

Reproduces how often: 100%

Versions

Not affected by versions.

etr commented 1 year ago

This wasn't a bug as we thought this through at the time (admittedly 10 years ago now) There HTTP standard does not specify a behavior and in fact the behaviors are inconsistent across different tech stacks, so our choice (admittedly the lazy one) was staying out of this business as most of our use-cases did not need it. As such, I am marking this as a feature request, but one I am happy to entertain.

Out of the frameworks that support multiple keys with the same name, the ones I've observed are:

  1. multiple keys: '?keys=value1&keys=value2'
  2. array keys: '?keys[]=value1,value2' and 'keys[0]=value1&keys[1]=value2'; some systems support sparse arrays: 'keys[]=value1&keys[]=value2'.
  3. comma separated keys: '?keys=value1,value2'

Some support all three.

There is also a matter of default behavior. I think both Django and Flask (for example) would force you to read the value with a separate method if you want to fetch all the values vs their default that pulls only one of the values. Others default to list.

I guess what I am saying is, I'd like to have a more in-depth view of why we chose the specific implementation we chose and potentially reason through the options before jumping onto one, given that we are effectively doing the work the RFC should have done for us.

I guess your idea for supported syntax in the params is (1) - besides Go, are you suggesting that as the one you've seen to be the most popular? (I haven't surveyed other frameworks myself, but I think that might be the right way of deciding).

In respect to the default reading method, what approach are you suggesting? I prefer to have a default that reads only one value, so that users don't have to bother with lists when 99% of the time they don't use them. We can have specific methods for list returns, but curious on yours (and others) view.

stuart-byma commented 1 year ago

Thanks @etr for your thoughts on this.

Yeah, the supported syntax would be 1), which is not really a departure from the current support imo. It's more a slight change in semantics. I think it also depends quite heavily on what MHD supports. E.g. I don't believe MHD supports syntax 2) and 3), but I am not 100% sure. The RFC also doesn't seem to mention these syntaxes https://www.rfc-editor.org/rfc/rfc3986#section-3.4, so I am not sure it's worth worrying about them.

Here is my opinion on methods/interface: get_args should return everything. This means a collection of values for each key. get_arg keeps the current behavior and (like MHD) returns just the "first" value (which in the current impl is actually the last value given the cache construction). We could then add get_arg_all (name tbd) which returns all the values for a given key.

etr commented 1 year ago

E.g. I don't believe MHD supports syntax 2) and 3), but I am not 100% sure. The RFC also doesn't seem to mention these syntaxes https://www.rfc-editor.org/rfc/rfc3986#section-3.4, so I am not sure it's worth worrying about them.

They do not. To be fair, the RFC doesn't specify what the behavior should be in case of multiple values for the same key, so there is little there that we can use. But I agree with you, with libmicrohttpd supporting already multiple keys with the same value it makes more sense to go for (1). It also seems (from a quick search around) to be the most popular across other frameworks and the others can always be implemented later (or anyway be implemented by consumers of the library as it would just be string parsing at that point).

For the implementation, I'd be tempted to create a new type for args values and change the arg_view_map type to use that for the value instead of just string_view (std::map<std::string_view, http_arg_value, http::arg_comparator>). We can make conversions transparent through cast operators. In terms of the http_request object, I'd change the default methods (get_arg and get_args) to just return the new type. We can have two additional methods (get_arg_flat and get_args_flat) that do the conversion to single string automatically for ease of use. The result is not entirely transparent for people wanting a single value, but it is pretty much transparent while not hiding from them the fact that there is indeed some complexity that they might want to consider while building their own servers.

What do you think of something like the below?

class http_arg_value {
public:
    std::vector<std::string_view> > values;

    ....

    std::string_view get_flat_value() const
    {
        return values.get(0);
    }

    std::vector<std::string_view> get_all_values() const
    {
        return values;
    }

    operator std::string() const
    {
        return get_flat_value();
    }

    operator std::vector<std::string>() const
    {
        return get_all_values();
    }
};

In http request:

const http::arg_view_map get_args() const;

http_arg_value get_arg(std::string_view key) const;

const std::map<std::string_view, std::string_view, http::arg_comparator> get_args_flat() const;

std::string_view get_arg_flat(std::string_view key) const;
stuart-byma commented 1 year ago

I am working on putting this together, and ran into some puzzling behavior from webserver.cpp. This code:

         if (filename == nullptr || mr->ws->file_upload_target != FILE_UPLOAD_DISK_ONLY) {
            mr->dhr->set_arg(key, std::string(mr->dhr->get_arg(key)) + std::string(data, size));
        }

in webserver::post_iterator has the effect of setting arg values with the same key equal to the nth value concatenated with the first value. Completely unexpected behavior. What is the purpose of this? What is the appropriate fix to make here?

etr commented 1 year ago

That comes from the fact that the system did not take into account the case of multiple parameters with the same key.

The reason that concatenation exists is because the method is called multiple times by libmicrohttp if the file is too big to be processed in a single chunk. So, when we upload the file to memory, we are chaining the pieces together.

Of course, that all breaks whenever you have multiple parameters with the same key, because the way it is written, it triggers the behavior.

The best course of action, at implementation, would be to start by writing a few unit tests that cover for both multiple parameters with the same key and also make sure we have tests with large files so to check that the behavior keeps being correct in that case as well.

stuart-byma commented 1 year ago

How big is too big to be processed in one chunk?

etr commented 1 year ago

How big is too big to be processed in one chunk?

The size is configurable through the server constructor builder