Stiffstream / restinio

Cross-platform, efficient, customizable, and robust asynchronous HTTP(S)/WebSocket server C++ library with the right balance between performance and ease of use
Other
1.15k stars 93 forks source link

On the way to additional type-safe router for RESTinio? #80

Closed eao197 closed 4 years ago

eao197 commented 4 years ago

There are several flaws I don't like in express-like router. So I tried to make an experiment with a type-safe router. Here is the description of the first result of that experiment.

I'll be glad to receive some feedback from ones those interested in the RESTinio's evolution.

prince-chrismc commented 4 years ago

A snippet from your branch

auto book_num = epr::produce< book_number_t >(
                                epr::slash(),       epr::non_negative_decimal_number_producer< book_number_t >()  >> epr::as_result() );
router->add_handler(
                restinio::http_method_get(),
                epr::root(), by( &books_handler_t::on_books_list ) );
router->add_handler(
                restinio::http_method_get(),
                book_num, by( &books_handler_t::on_book_get ) );

The concept is very interesting. Hopefully you can explain some of the details.

In the example noted above, one begins with root producer and the other begins with slash producer. Both case translate to the first character being equal so that path matches?

Is this the intended effect? Or should all paths begin with root?


I wonder how applicable this great feature will be. All of the RESTful APIs I have seen, used and/or implemented never had complex data structures in the URI paths.

For instance, take the docker API is /docker/v1/container/id where I'd is some hash string representation. This is very typical in my experience and some have complex regex for the IDs (such as ^{[A-Fa-f0-9]{8}-([A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}}$)

How would the easier parser handle regex strings?


One concept which was very interesting from #65 Was the operator/() do you forsee being able to replace the epr::slash() with something equally as expressive?

eao197 commented 4 years ago

In the example noted above, one begins with root producer and the other begins with slash producer. Both case translate to the first character being equal so that path matches?

To clarify this point it's better to rewrite the code that way:

router->add_handler(
     restinio::http_method_get(),
     symbol_p('/') >> just(root_t{}), // Match only if '/' is the whole path.
     by( &books_handler_t::on_books_list ) );
router->add_handler(
     restinio::http_method_get(),
     produce<book_number_t>(
        // Path should start from '/'
        symbol('/'),
        // Then should be a decimal number
        non_negative_decimal_numer_p<book_number_t>() >> as_result()),
     by( &books_handler_t::on_book_get ) );

Both path start with slash, but the first path expects only one starting slash without any other content. And the second path expects slash followed by one decimal number.

A path specified for add_handler requires the full match with the path from an incoming request (without an optional trailing slash). So if one specifies something like:

router->add_handler(
   restinio::http_method_get(),
   exact_p("/abc") >> just(root_t{}),
   handler);

then this handler will be called for GET requests with paths /abc or /abc/, but not for /abcd, /abc/d and so on.

Is this the intended effect? Or should all paths begin with root?

Every path specification for easy_parser router should be a producer. It means that after the parsing of a path value of some type should be produced. For some cases it's obvious what it should be:

router->add_handler(restinio::http_method_get(),
   produce<book_number_t>(...), // The result be book_number_t instance.
   [](const restinio::request_handle_t & req, const book_number_t booknum) {...});

The parser for '/' path should also return some value. Which value?

I think it should be an empty helper type like:

struct root_t {};

And epr::root_p() is just a helper function like that:

[[nodiscard]] auto root_p() { return symbol('/') >> just(root_t{}); }

It can be read this way: if there is '/' symbol in the input an instance of root_t should be returned as the result of path parsing.

So the usage of root_p allows to write handlers for / paths:

router->add_handler(restinio::http_method_get(),
   epr::root_p(),
  [](const restinio::request_handle_t & req, const epr::root_t) {...});
eao197 commented 4 years ago

All of the RESTful APIs I have seen, used and/or implemented never had complex data structures in the URI paths.

I hope this feature is not only for complex cases where 3-4-5-or-more values are represented in a path. It's more for avoiding stupid errors like:

router->handle_get("/api/v1/goods/:id",
   [](const auto & req, auto & params) {
      auto id = restinio::cast_to<int>(params["Id"]); // Oops!
      ...
   });

Even if a programmer writes the name of parameter right there still are some places for errors:

A programmer can forget to specify a format for the value.

router->handle_get("/api/v1/goods/:id",
   [](const auto & req, auto & params) {
      auto id = restinio::cast_to<int>(params["id"]); // Parameter can be non-number.
      ...
   });

A programmer can specify a format that is too relaxed:

router->handle_get("/api/v1/goods/:id(\d+)",
   [](const auto & req, auto & params) {
      auto id = restinio::cast_to<int>(params["id"]); // A value can be too big to fit into int.
      ...
   });

So more or less carefully writen code should looks like:

router->handle_get("/api/v1/goods/:id(\d{1,10})",
   [](const auto & req, auto & params) {
      auto id = restinio::cast_to<int>(params["id"]);
      ...
   });

But if such ":id" parameter is reused in API several times it will be boring and error-prone to repeat similar fragments of code again and again:

const restinio::string_view_t path_to_goods{"/api/v1/goods/:id(\d{1,10})"};
router->handle_get(path_to_goods,
   [](const auto & req, auto & params) {
      auto id = restinio::cast_to<int>(params["id"]); // Have to extract :id manually.
      ...
   });
router->handle_post(path_to_goods,
   [](const auto & req, auto & params) {
      auto id = restinio::cast_to<int>(params["id"]);
      ...
   });
router->handle_delete(path_to_goods,
   [](const auto & req, auto & params) {
      auto id = restinio::cast_to<int>(params["id"]);
      ...
   });

So I think it's better to move working with ":id" in path to reusable code. Something like:

using goods_id_t = int;
...
auto path_to_goods = epr::produce<goods_id_t>(
   epr::exact("/api/v1/goods/"),
   epr::non_negative_decimal_number_p() >> epr::as_result());
...
router->handle_get(path_to_goods,
   [](const auto & req, goods_id_t id) { // We already have value of ":id". No need to extract manually.
      ...
   });
router->handle_post(path_to_goods,
   [](const auto & req, goods_id_t id) {
      ...
   });
router->handle_delete(path_to_goods,
   [](const auto & req, goods_id_t id) {
      ...
   });
eao197 commented 4 years ago

This is very typical in my experience and some have complex regex for the IDs (such as ^{[A-Fa-f0-9]{8}-([A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}}$) How would the easier parser handle regex strings?

I think there are at least two ways to cope with such patterns:

The first is to rewrite that pattern in easy_parser terms. For example, if we want to have ID is string representation, then we can to write the following parser for it:

using namespace restinio::easy_parser;
const auto parser = produce<std::string>(
    repeat(8u, 8u, hexdigit_p() >> to_container()),
    symbol_p('-') >> to_container(),
    repeat(3u, 3u,
        repeat(4u, 4u, hexdigit_p() >> to_container()),
        symbol_p('-') >> to_container()),
    repeat(12u, 12u, hexdigit_p() >> to_container())
);

And then this parser can be used in routers:

router->handle_get(
   produce<std::string>(exact("/api/v1/users/"), parser >> as_result()),
   [](const auto & req, const std::string & uuid) {...});

Or we can want to have ID as non-string object. In this case it's possible to do something like:

struct user_id {
   std::uint32_t part1_;
   std::uint16_t part2_;
   std::uint16_t part3_;
   std::uint16_t part4_;
   std::uint64_t part5_;
   ...
};
using namespace restinio::easy_parser;
const auto parser = produce<user_id>(
    hex_number_p<std::uint32_t>(8u, 8u) >> &user_id::part1_,
    symbol('-'),
    hex_number_p<std::uint16_t>(4u, 4u) >> &user_id::part2_,
    symbol('-'),
    hex_number_p<std::uint16_t>(4u, 4u) >> &user_id::part3_,
    symbol('-'),
    hex_number_p<std::uint16_t>(4u, 4u) >> &user_id::part4_,
    symbol('-'),
    hex_number_p<std::uint64_t>(12u, 12u) >> &user_id::part5_
);

And then this parser can be used in routers:

router->handle_get(
   produce<user_id>(exact("/api/v1/users/"), parser >> as_result()),
   [](const auto & req, const user_id & uuid) {...});

Note. There is no hex_number_p producer in RESTinio yet.

The second way is to add a producer that uses regexp as a parameter:

using namespace restinio::easy_parser;
const auto parser = regexp_based_p("[A-Fa-f0-9]{8}-([A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}");
...
router->handle_get(
   produce<std::string>(exact("/api/v1/users/"), parser >> as_result()),
   [](const auto & req, const std::string & uuid) {...});
eao197 commented 4 years ago

One concept which was very interesting from #65 Was the operator/() do you forsee being able to replace the epr::slash() with something equally as expressive?

There are several problems that can prevent the usage of operator/ in easy_parser_router's DSL:

I've also found that operator/ can be expressive only if there is a possibility to mix string literals with easy_parser constructs and can skip >> as_result() somehow. Something like:

router_get("/api/v1/books" / book_id, ...);
router_get("/api/v1/books" / book_id / "history", ...);
router_get("/api/v1/books" / book_id / "versions", ...);

But because at this moment the only way is to write:

router_get(produce<book_id_t>(exact("/api/v1/books/"), book_id >> as_result()), ...);
router_get(produce<book_id_t>(exact("/api/v1/books/"), book_id >> as_result(), exact("/history")), ...);
router_get(produce<book_id_t>(exact("/api/v1/books/"), book_id >> as_result(), exact("/versions")), ...);

there is no benefits from usage of operator/.

Unfortunately, I don't know is it possible to make DSL that allows to write something like that:

router_get("/api/v1/books" / book_id / "versions", ...);

May be user-defined literals can helps here:

router_get("/api/v1/books"_exact / book_id / "versions"_exact, ...);

This is a dark area for the further investigations...

prince-chrismc commented 4 years ago

I hope this feature is not only for complex cases where 3-4-5-or-more values are represented in a path. It's more for avoiding stupid errors

I have three different ways to extract version in my code base.... Your vision for:

So I think it's better to move working with ":id" in path to reusable code. Something like:

using goods_id_t = int;
...
auto path_to_goods = epr::produce<goods_id_t>(
   epr::exact("/api/v1/goods/"),
   epr::non_negative_decimal_number_p() >> epr::as_result());
...
router->handle_get(path_to_goods,
   [](const auto & req, goods_id_t id) { // We already have value of ":id". No need to extract manually.
      ...
   });

Would absolutely be useful in almost every use case.


There's a theme which I did not pick up one reading your blog post and going over the code review. By using the type safe the handlers become more precise. ie replace the route_params_t with teh concrete type that is required.

Now would this be as extensible as:

router->handle_get(
   produce<uuid_t, layer_t >(exact("/api/v1/users/"), parser_uuid >> as_result(), parser_layer >> as_result()),
   [](const auto & req, const uuid_t & uuid, const layer_t & layer) {...});

Or would this need to be constrained to just one type for the producer? Or perhaps Multiple producers to with one type?


I've also found that operator/ can be expressive only if there is a possibility to mix string literals with easy_parser constructs and can skip >> as_result() somehow. Something like:

router_get("/api/v1/books" / book_id, ...);
router_get("/api/v1/books" / book_id / "history", ...);
router_get("/api/v1/books" / book_id / "versions", ...);

That would obviously be the most expressive, but I suspect something in the middle such as:

router_get(produce<book_id_t>("/api/v1/books"_exact / book_id / "versions"_exact >> as_result() );

Would still lend well for readability.

eao197 commented 4 years ago

Or would this need to be constrained to just one type for the producer? Or perhaps Multiple producers to with one type?

The answer is simple: I just don't know how to implement it :(

With just one object produced by a producer entity the expression like:

produce<MyType>(clause1, clause2, ..., clauseN)

is converted to something like that:

expected<MyType, parse_error_t> actual_producer_code__(input_t & from) {
   MyType result;
   {
      const auto parse_result = clause1.try_parse(from, result); // result can be modified here
      if(!parse_result)
         return make_unexpected(parser_result.error());
   }
   {
      const auto parse_result = clause2.try_parse(from, result); // result can be modified here
      if(!parse_result)
         return make_unexpected(parser_result.error());
   }
   ...
   return result;
}

But for the expression:

produce<MyType1, MyType2, MyType3>(clause1, clause2, ..., clauseN);

it is necessary to generate something like that:

expected<tuple<MyType1, MyType2, MyType3>, parse_error_t>
actual_producer_code__(input_t & from) {
   tuple<MyType1, MyType2, MyType3> result;
   {
      // A problem here: should the whole result to be passed to try_parse?
      // Or just a reference to just one of its item?
      const auto parse_result = clause1.try_parse(from, result);
      if(!parse_result)
         return make_unexpected(parser_result.error());
   }
   ...
   return result;
}

And after that the tuple should be expanded to make a call to user-supplied handler.

I don't know is my knowledge of C++ enough to produce necessary template magic to do something like that. Can C++14 allow that magic? Even if C++14 can, what about real C++ compilers? What about compile times and requirements. I don't think that it's appropriate if a compilation of some RESTinio example would require 32GiB of RAM and 16-code CPU ;)

So I decided to limit the number of produced values. Just one can be produced.

I don't think it's a big problem, because C++ allows to define a new type in an ad-hoc fashion. So your example can be rewritten that way:

struct uuid_and_layer_t {
   uuid_t uuid_;
   layer_t layer_;
};
router->handle_get(
   produce<uuid_and_layer_t >(
      exact("/api/v1/users/"),
      parser_uuid >> &uuid_and_layer_t::uuid_,
      parser_layer >> &uuid_and_layer_t::layer_),
   [](const auto & req, const uuid_and_layer_t & params) {...});

That approach has a benefit: a content of helper type can easily be changed without a need to fix the parameter list of a handler. So in some time in the future you can modify your path by adding an optional sublayer:

struct uuid_and_layer_t {
   uuid_t uuid_;
   layer_t layer_;
   optional<layer_t> sublayer_;
};
router->handle_get(
   produce<uuid_and_layer_t >(
      exact("/api/v1/users/"),
      parser_uuid >> &uuid_and_layer_t::uuid_,
      parser_layer >> &uuid_and_layer_t::layer_,
      maybe(parser_layer >> &uuid_and_layer::sublayer_)),
   [](const auto & req, const uuid_and_layer_t & params) {...});

And there won't be a need to change the parameter list of your handler.

eao197 commented 4 years ago

It seems that I've found a way to make a more compact and easier-to-read DSL for easy_parser_router. It's possible to describe routes this way:

router->http_get(
   epr::path_to_params("/api/v1/books"),
   [](const auto & req) { // NOTE: there is only `req` param.
      ...
   });
router->http_get(
   epr::path_to_params("/api/v1/books/", book_id_parser),
   [](const auto & req, auto book_id) { // NOTE: there is additional parameter.
      ...
   });
router->http_get(
   epr::path_to_params("/api/v1/books/", book_id_parser, "/versions/", ver_id_parser),
   [](const auto & req, auto book_id, auto ver_id) { // NOTE: there are two additional params.
       ...
   });

This approach works and its first version is already in the corresponding dev-branch. An actual example can be seen here. I'll continue that work after the weekend.

A note about the name path_to_params. It means that path (or route) specified by a user should be divided into individual values and those values should be passed to route-handler as separate parameters. So if a user writes:

path_to_params("/", a1_parser, "/", a2_parser, "/", a3_parser, "/", a4_parser)

then a route-handler should have the following form:

request_handling_status_t (req, a1_value, a2_value, a3_value, a4_value)

where a*_value can be a value, or a reference, or a const reference.

There is also path_to_tuple function. It collects all values from a path (route) and passes them to a route-handler as a tuple object:

router->http_get(
   path_to_tuple("/", a1_parser, "/", a2_parser, "/", a3_parser, "/", a4_parser)
   [](const auto & req, auto params) { // NOTE: just one additional parameter.
      auto a1 = std::get<0>(params);
      auto a2 = std::get<1>(params);
      ...
   });
eao197 commented 4 years ago

A new type-safe router is implemented in v.0.6.6.