facil-io / cstl

facil.io C STL - a Server Toolbox Library for C, including JSON processing, hash maps, dynamic arrays, binary strings and more.
https://facil.io
MIT License
71 stars 6 forks source link

Help me design / write a generic HTTP routing helper library for the http_s struct. #26

Open thierry-f-78 opened 3 months ago

thierry-f-78 commented 3 months ago

Following this thread : https://github.com/boazsegev/facil.io/issues/155

After reading fio-stl, I better understand your position in general, and more particularly regarding regexps.

I therefore identify the wrappers strtol for integers and glob for wildcards. I will remove the regexps (anyway, regexes are heavy, slow and difficult to debug, but they make life easier for the end user). The other types I propose are trivial.

However, I don't see anything in fio-stl for timestamps.

Here's a proposed API.

I wonder if the router shouldn't return a list of "void *" that the user can execute as they want (blocking or non-blocking).

I wonder if the error reporting system should be rethought. Indeed, when parsing something complex, the user needs an error message that is more elaborate than "ok" / "nok".

/* ************************************************************************* */
#if !defined(FIO_INCLUDE_FILE) /* Dev test - ignore line */
#define FIO___DEV___           /* Development inclusion - ignore line */
#define FIO_ROUTER             /* Development inclusion - ignore line */
#include "./include.h"         /* Development inclusion - ignore line */
#endif                         /* Development inclusion - ignore line */
/* *****************************************************************************

                      HTTP requests router

Copyright and License: see header file (000 copyright.h) or top of file
***************************************************************************** */
#if defined(FIO_ROUTER) && !defined(H___FIO_ROUTER___H) &&                     \
    !defined(FIO___RECURSIVE_INCLUDE)
#define H___FIO_ROUTER___H 1

/* *****************************************************************************
Router configuration settings
***************************************************************************** */

/** Struct used to define routes */
typedef struct fio_router_s fio_router_s;

/**
 * This function handle http request, return 0 to continue processing
 * return -1 to stop processing with error, return 1 to stop processing
 * without error.
 */
typedef int (*fio_handler_t)(fio_http_s *h, int args_num, void **args);

/** Create a new router system */
FIO_IFUNC fio_router_t *fio_router_new();

/** Destroy the router system */
FIO_IFUNC void fio_router_free(fio_router_t *router);

/** Register a route
 *
 * "methods" is a comma separated list of methods.
 *
 *    GET,POST
 *
 * "path" is a path according with this EBNF definition
 *
 *    path = ["/"], { segment, "/" }, segment, ["/"]
 *
 *    segment = static_segment | dynamic_segment ;
 *
 *    static_segment = ["::", ] { character - ":" } ;
 *
 *    dynamic_segment = ":", name, [":", match] ;
 *
 *    name = (letter | "_"), { letter | digit | "_" } ;
 *
 *    match = ("~", regexp) | ("%", wildcard) | type_match | enum_match |
 *            date_match | hex_match | range_match ; type_match = "int32" |
 *            "uint32" | "int64" | "uint64" | "bool" | "b64" | "b64i" | "uuid" |
 *            "email" | "slug" ;
 *
 *    hex_match = ("hex" | "hexl" | "hexu"), ["(", length, ")"] ;
 *
 *    regexp = ? PCRE compatible regular expression ? ;
 *
 *    wildcard = ? simplified wildcard pattern, e.g., "*" for any characters,
 *               "?" for single character ? ;
 *
 *    length = digit - "0", { digit } ;
 *
 *    enum_match = * "enum", "(", word_list, ")" ;
 *
 *    word_list = word, { ",", word } ;
 *
 *    word = * (letter | digit | "-" | "_"), { letter | digit | "-" | "_" } ;
 *
 *    date_match = * "date", ["(", date_format, ")"] ;
 *
 *    range_match = "range", "(", range_list, ")" ;
 *
 *    range_list = range, { ",", range } ;
 *
 *    range = single_value | open_range | closed_range ;
 *
 *    single_value = number ; open_range = (number, "-") | ("-", number) ;
 *
 *    closed_range = number, "-", number ;
 *
 *    number = digit, { digit }, [".", { digit }] ;
 *
 *    letter = "A" | "B" | "C" | ... | "Z" | "a" | "b" | "c" |
 *             ... | "z" ; digit = "0" | "1" | "2" | ... | "8" | "9" ;
 *
 *    character = ? any Unicode character ? ;
 *
 *    date_format = ? date format string * ? ;
 *
 *  the "type_match" list could be completed with the function
 *  "fio_router_add_type" described below
 *
 *  exemples:
 *
 *  "content_types" is comment separated list of wildcard content-types.
 *  (c-comments tag forbid some form, so remove spaces if you copy these
 *  examples).
 *
 *    "application / *"
 *  "application/json,application/ld+json,application/csp-report,application/manifest+json,application/schema+json,application/problem+json"
 *    "* / *"
 *    "" : same than "* / *"
 *
 * return 0 is the route is register, otherwise returns -1.
 *
 * Usage example:
 *
 * fio_route(router, "GET,POST", "/api/users/:id", "application/json",
 *           handle_users);
 *
 * fio_route(router, "GET,POST", "/api/users/:id:hex(32)",
 *           "* / *", handle_users);
 */
FIO_IFUNC int fio_register_route(fio_router_t *router,
                                 const char *methods,
                                 const char *path,
                                 const char *content_types,
                                 fio_handler_t handler);

/** Define validation and parsing functions for custom parameters types
 *
 *  The function gets buffer and it length (buffer is not nul terminated).
 *  ptr points on opaque pointer.
 *  The function fill opaque pointer with parser value and return 0 if
 *  the parser is a success otherwise return -1.
 */
typedef int (*fio_parser_t)(const char *buffer, int length, void **ptr);

/** Add custom parameters types
 *
 *  The fucntion add application specific parser. This parser can be used
 *  in path definition. The specific parser is just a keyword and cannot
 *  handle parameters. The fucntion fails if the parser nale already exists.
 *  Return 0 in success case, otherwise return -1.
 */
FIO_IFUNC int fio_router_add_type(const char *type_name, fio_parser_t parser);

/** Add middleware to a router
 *
 *  The middleware function is executed when the router is selectionned and
 *  at least one item matchs.
 *
 *  The function returns 0 if the registration is success, otherwise it returns
 *  -1. Note this function fail only on out of memory error.
 */
FIO_IFUNC int fio_router_set_middleware(fio_router_t *router,
                                        fio_handler_t middleware);

/** Add a sub-router
 *
 *  bind sub router on some path. More than one subrouter could be registered
 *  on specific path. There are evaluated in the declaration order.
 *
 *  The function returns 0 if the registration is success, otherwise it returns
 *  -1.
 */
FIO_IFUNC int fio_router_mount(fio_router_t *parent,
                               const char *prefix,
                               fio_router_t *child);

/* *****************************************************************************
Router compilation and runtime
***************************************************************************** */

/** Used to execute router */
typedef struct fio_router_exec_s fio_router_exec_s;

/** Compile the router system
 *
 *  Compile declared routes. Return NULL if an error occurs, otherwise
 *  return pointer on new router_exec instance.
 */
FIO_IFUNC fio_router_exec_s *fio_router_compile(fio_router_t *router);

/** Destroy the router exec system */
FIO_IFUNC void fio_router_exec_free(fio_router_exec_s *router_exec);

/** Execute router
 *
 *  Return -1 if a critical error occurs, otherwise return 0 to continue
 *  processing, or 1 to stop processing.
 */
FIO_IFUNC int fio_router_exec(fio_router_exec_s *router_exec, fio_http_s *h)

/* *****************************************************************************
ROUTER - cleanup
***************************************************************************** */
#undef FIO___ROUTER_ON_ALLOC
#undef FIO___ROUTER_ON_FREE
#endif /* FIO_EXTERN_COMPLETE*/
#endif /* FIO_ROUTER */
#undef FIO_ROUTER
boazsegev commented 3 months ago

Hi @thierry-f-78

I love many things in this approach, although the API still seems somewhat problematic.

I love the addition of fio_router_set_middleware and would rename it to fio_http_router_middleware_push (like a URL, climb the name space until we reach the action).

Also, if the router is a structure, it should end with the _s suffix (_t is a reserved suffix and shouldn't be used unless writing POSIX library code).

Also, I would love to make the route addition even simpler if possible, making it simple to define resource routes in a way that is similar to what they do with Ruby on Rails.

It's an idiomatic use case that I encountered a lot when designing business apps.

I am missing details as well as implementation constraints to what I believe would be the ideal approach, but following what you suggest so far it might look like this:


/** The HTTP Router type*/
typedef struct fio_http_router_s fio_http_router_s;

/** Named arguments for the fio_http_router_map() functions. */
typedef struct {

  /** Default opaque user data to be set for the HTTP handle (fio_http_s). */
  void *udata;

  /* ---- Catch-All Callback (except authentication) ---- */

  /** Default handler. Missing methods will return a 404 error. */
  int (*on_http)(fio_http_s *h);

  /* ---- Route Filtering ---- */

  /**
   * If the `accept` header doesn't list this content type, route fails.
   *
   * This also sets the default response content-type header.
   */
  fio_buf_info_s content_type;

  /* ---- CRUD Callbacks ---- */

  /** GET request handler. */
  int (*on_get)(fio_http_s *h);
  /** POST request handler. */
  int (*on_post)(fio_http_s *h);
  /** PUT / PATCH request handler. */
  int (*on_put)(fio_http_s *h);
  /** DELETE request handler. */
  int (*on_delete)(fio_http_s *h);

  /* ---- Parameter Handling Callbacks ---- */

  /** Parameter parsing callback. */
  int (*on_parameter_true)(fio_http_s *h, fio_buf_info_s name);
  /** Parameter parsing callback. */
  int (*on_parameter_false)(fio_http_s *h, fio_buf_info_s name);
  /** Parameter parsing callback. */
  int (*on_parameter_empty)(fio_http_s *h, fio_buf_info_s name);
  /** Parameter parsing callback. */
  int (*on_parameter_str)(fio_http_s *h,
                          fio_buf_info_s name,
                          fio_buf_info_s value);
  /** Parameter parsing callback. */
  int (*on_parameter_int)(fio_http_s *h, fio_buf_info_s name, int64_t value);
  /** Parameter parsing callback. */
  int (*on_parameter_float)(fio_http_s *h, fio_buf_info_s name, double value);

  /* ---- WebSockets / SSE for specified path ---- */

  /** Authenticate EventSource (SSE) requests, return non-zero to deny.*/
  int (*on_authenticate_sse)(fio_http_s *h);
  /** Authenticate WebSockets Upgrade requests, return non-zero to deny.*/
  int (*on_authenticate_websocket)(fio_http_s *h);
  /** Called once a WebSocket / SSE connection upgrade is complete. */
  void (*on_open)(fio_http_s *h);

  /** Called when a WebSocket message is received. */
  void (*on_message)(fio_http_s *h, fio_buf_info_s msg, uint8_t is_text);
  /** Called when an EventSource event is received. */
  void (*on_eventsource)(fio_http_s *h,
                         fio_buf_info_s id,
                         fio_buf_info_s event,
                         fio_buf_info_s data);
  /** Called when an EventSource reconnect event requests an ID. */
  void (*on_eventsource_reconnect)(fio_http_s *h, fio_buf_info_s id);

  /** Called for WebSocket / SSE connections when outgoing buffer is empty. */
  void (*on_ready)(fio_http_s *h);
  /** Called for open WebSocket / SSE connections during shutting down. */
  void (*on_shutdown)(fio_http_s *h);
  /** Called after a WebSocket / SSE connection is closed (for cleanup). */
  void (*on_close)(fio_http_s *h);

  /**
   * A public folder for file transfers - allows to circumvent any application
   * layer logic and simply serve static files.
   *
   * Supports automatic `gz` pre-compressed alternatives.
   */
  fio_str_info_s public_folder;
  /**
   * The max-age value (in seconds) for possibly caching static files from the
   * public folder specified.
   *
   * Defaults to 0 (not sent).
   */
  size_t max_age;
} fio_http_router_settings_s;

/**
 * Note, if `router_or_sub_router` is NULL, a new router will be created.
 */
fio_http_router_s *fio_http_router_map(fio_http_router_s *router_or_sub_router,
                                       const char *url_with_params,
                                       fio_http_router_settings_s settings);

#define fio_http_router_map(router, url, ...)                                  \
  fio_http_router_map(router, url, (fio_http_router_settings_s){__VA_ARGS__})

Such an approach could set multiple related routes with one function call:

void example_use_simple(void) {
  fio_http_router_s *router;
  router = fio_http_router_map(NULL,
                               "/",
                               .on_http = my_404_handler,
                               .on_get = welcome_home);
  fio_http_router_map(router,
                      "/users",
                      .on_get = users_list,
                      .on_post = users_new_or_update,
                      .on_put = users_404_handler,
                      .on_delete = users_404_handler);
  fio_http_router_map(router,
                      "/users/:id",
                      .on_get = users_show,
                      .on_post = users_new_or_update,
                      .on_put = users_update,
                      .on_delete = users_remove);
  /* '*' in root will catch all. */
  fio_http_router_map(router, "*", .on_http = my_404_handler);
}

Or:

void example_use_with_sub_routes(void) {
  fio_http_router_s *router;
  fio_http_router_s *users_router;
  router = fio_http_router_map(NULL,
                               "/",
                               .on_http = my_404_handler,
                               .on_get = welcome_home);
  users_router = fio_http_router_map(router,
                                     "/users",
                                     .on_get = users_list,
                                     .on_post = users_new_or_update,
                                     .on_put = users_404_handler,
                                     .on_delete = users_404_handler);
  fio_http_router_map(users_router,
                      "/:id",
                      .on_get = users_show,
                      .on_post = users_new_or_update,
                      .on_put = users_update,
                      .on_delete = users_remove);
  /* '*' in root will catch all. */
  fio_http_router_map(router, "*", .on_http = my_404_handler);
}

And perhaps we could support "optional" parameters:

void example_use_with_optional_parameters(void) {
  fio_http_router_s *router;
  router = fio_http_router_map(NULL,
                               "/",
                               .on_http = my_404_handler,
                               .on_get = welcome_home);
  fio_http_router_map(router,
                      "/users(:id)",                              /* optional :id parameter */
                      .on_get = users_list_or_show, /* behaves differently if `id` was supplied*/
                      .on_post = users_new_or_update,
                      .on_put = users_update,
                      .on_delete = users_remove);
  /* '*' in root will catch all. */
  fio_http_router_map(router, "*", .on_http = my_404_handler);
}

Of course, how do we link the fio_http_router_s with the fio_http_listen is still a mystery for me, but... :)

boazsegev commented 3 months ago

As for the implementation details:

Using a tree with a method at the root of that tree would break possible custom method names.

This happens and we have to support it for one of our clients, when they have special (non-standard) method names sent by their client-side app. This is what we should allow a catch-all for each path that is method name agnostic.

As for the default behavior: I would like to allow developers to define a catch-all route.

I would also like to find a solution to possible language defying routes and similar optional segments (i.e., /(:lang)), where that segment is ignored by subsequent routes... i.e.:

void example_use_of_optional_parameter_at_start_of_url(void) {
  /* set a catch all for all routes. Maybe this should*/
  fio_http_router_s *router;
  router = fio_http_router_map(NULL,
                               "/(:lang=\?\?)", /* 2 letters and optional */
                               .on_http = test_language_or_ignore,
                               .on_parameter_str = set_language);
  fio_http_router_map(router,
                      "/", /* performed after `lang` was set */
                      .on_http = my_404_handler,
                      .on_get = welcome_home);

  /* '*' in root will catch all. */
  fio_http_router_map(router, "*", .on_http = my_404_handler);
}

Once an optional segment was extracted, the rest of the routers should ignore the extracted segment.

In other words, if an HTTP handler callback returns an error, the next matching route should be tested. Any consumed parameters should be ignored when continuing to the next route.

thierry-f-78 commented 3 months ago

I love the addition of fio_router_set_middleware and would rename it to fio_http_router_middleware_push (like a URL, climb the name space until we reach the action).

👍

Also, if the router is a structure, it should end with the _s suffix (_t is a reserved suffix and shouldn't be used unless writing POSIX library code).

👍

Also, I would love to make the route addition even simpler if possible, making it simple to define resource routes in a way that is similar to what they do with Ruby on Rails.

CRUD is designed to reflect basic database operations: SELECT/GET, INSERT/POST, UPDATE/PUT and DELETE/DELETE. This is very useful in some applications, but greatly limits the possibilities. If you limit the use of a router to CRUD, it excludes everything that isn't CRUD, which is 3/4 of the internet.

For example:

In my opinion, a router should allow implementing CRUD, but shouldn't be limited to that, otherwise it can't be widely adopted. The router should allow the user to do what they want.

We can also note the (non-exhaustive) specified methods are: GET, POST, PUT, DELETE, HEAD, OPTIONS, TRACE (RFC 7231), PATCH (RFC 5789), PROPFIND, PROPPATCH, MKCOL, COPY, MOVE, LOCK, UNLOCK (RFC 4918).

And most importantly, RFC 7231 states: "HTTP methods are extensible. HTTP implementations MUST be prepared to process methods they do not recognize as being safe and idempotent methods..."

So I think the methods should not be limited and hard-coded, but left to the developer's initiative.

Moreover, most developments do not respect these standards, at least not completely. In a structured project, you always find an exception. You also find legacy code to respect.

It's preferable to handle 404 and 405 with fallback functions.

Using a tree with a method at the root of that tree would break possible custom method names.

Not necessarily, the method at the root is not hard-coded. The root can be a sorted list of methods in the form of strings. Alphabetical ordering is done at compilation and allows for a binary search to match the method (Maybe unnecessary, because for 10 entries it might be faster to traverse a list than to do a binary search). Other approaches may consist of quickly processing standard methods (GET, POST, ...) and using a fallback method for the rest.

This happens and we have to support it for one of our clients, when they have special (non-standard) method names sent by their client-side app. This is what we should allow a catch-all for each path that is method name agnostic.

👍

As for the default behavior: I would like to allow developers to define a catch-all route.

👍

Once an optional segment was extracted, the rest of the routers should ignore the extracted segment.

This is exactly the role of sub-routers. The goal is to ignore what has already been processed by the parents.

In other words, if an HTTP handler callback returns an error, the next matching route should be tested. Any consumed parameters should be ignored when continuing to the next route.

Indeed, each registered segment is evaluated in the order of registration, and if an evaluation fails, we move on to the next one. If all evaluations fail, we look for a fallback for the router.

Return of the routing function

I wonder if the routing function shouldn't return an array with the list of functions to execute rather than calling callbacks. The advantages are:

Error processing

The error reporting system should also be worked on. The idea is to carry in the function prototypes a system of argumentative error reporting. Something like this:

int a(char **err) {
    if (detect-error) {
        explain_error(err, "error in 'a' at char %d", line);
        return -1;
    }
}

int b(char **err) {
    if (a(err) == -1) {
        explain_error(err, "error in 'b': %s", *err);
    }   
}

int c(char **err) {
    if (b(err) == -1) {
        explain_error(err, "error in 'c': %s", *err);
    }   
}

int main() {
    char *err = NULL;

    if (c(&err) == -1) {
        printf("error: %s\n", err)
    }
}

The explain_error() function takes care of crafting the error messages and especially making them inherit from each other in order to make a complete message easy to understand for the developer (or the application administrator).

In the above example, the returned message is:

"error: error in 'c': error in 'b': error in 'a' at char 22\n"

Of course, this message is ugly, the idea is just to illustrate the inheritance of errors.

boazsegev commented 3 months ago

So I think the methods should not be limited and hard-coded, but left to the developer's initiative.

I agree, but I also think that some idiomatic patterns should be easier to implement.

The OPTIONS and PATCH methods could definitely be added as independent callbacks, but I suggest that everything that is less idiomatic can be handled in the on_http callback that can check methods and perform additional routing.

Also, we could easily add method to the filter options:

typedef struct {

  /** ....... */

  /* ---- Route Filtering ---- */

  /**
   * If the `accept` header doesn't list this content type, route fails.
   *
   * This also sets the default response content-type header.
   */
  fio_buf_info_s content_type;

  /**
   * Filters away any requests in which the HTTP method isn't one of the methods in this comma separated buffer.
   */
  fio_buf_info_s methods;

  /** ....... */

} fio_http_router_settings_s;

It's preferable to handle 404 and 405 with fallback functions.

I agree, but sometimes you want to handle a 404 inside a relative path, such as within an admin console or other part of the app.

After all routes fail, 404 will be the obvious response, but other than that it would be nice to have the option to answer with 404 while still keeping a user within a specific scope.

This is exactly the role of sub-routers. The goal is to ignore what has already been processed by the parents.

Yes, but sub-routers ignore other possible behaviors, such as middleware that re-writes the path (i.e., if someone implemented a language middleware in some way).

I wonder if the routing function shouldn't return an array with the list of functions to execute rather than calling callbacks. The advantages are:

As a rule, I want to place a little trust as we can in the developer's hands. Trust in developers – especially when it comes to teams and each of them finding a different solution / approach to the same issue – is often a cause for frustration and user-related bug reports.

I believe it is our responsibility, as authors, to provide the safest and most novice friendly (user-proof) way to handle things.

This is why I think that we should run all callbacks in the router and stop the callback cain if one of them returns a non-zero value.

Callbacks could use the fio_env_set for cleanup code (as the cleanup is called when the connection is disconnected) or use the on_finish and on_open callbacks for response specific cleanup.

Middleware is obviously biased towards linear code, but if we control the middleware design, we can make sure cleanup is always performed.

Error processing

That's something I have yet to think about.

In general, there's a difference between parser / library detected errors (we need to explain) and errors the user reports (they should log the error so they know what happened, we just need to know an error occurred).

My tendency is to provide error logging using FIO_LOG_XXXX (i.e., FIO_LOG_WARNING or FIO_LOG_ERROR or FIO_LOG_FATAL) and return the error without providing access to the specifics of the error. This isn't always ideal and it is context specific, but that's so far the main approach.

I believe code should simply run or simply break.

Code that breaks should be secure – i.e., breaking on external input should never lead to memory leaks or a crash while internal code errors should prevent compilation (ideal) or prevent the server from starting (crash the program during the setup stage).