boazsegev / facil.io

Your high performance web application C framework
http://facil.io
MIT License
2.1k stars 137 forks source link

Help me design / write a generic HTTP routing helper library for the http_s struct. #155

Closed thierry-f-78 closed 1 month ago

thierry-f-78 commented 1 month ago

Are you yet interested by this subject. Doing C HTTP framework is one of my favorite subject with c10k problem and asynchronous design in C

I just see your project, and router design is an interesting subject. I start to brainstorm about design. the following is preliminary and not terminated.

/*
HTTP Router API

The router is designed to be simple with a concise syntax for easy adoption by users. It should also be performant during execution.

The implementation is composed of several stages:

1. Router Configuration Management
   This includes functions and macros that allow developers to easily define their routes. This section comprises:
   * A path definition language with support for variable parameters
   * A method definition system
   * A route definition system

2. Compilation
   A system for compiling the "configuration" that provides optimal structures for quickly finding paths.

Vocabulary:
* path: The URL path, e.g., /path/to/api/method
* segment: In a path, it's the part between two "/"

General features and base concepts:

1. Route Definition
   * HTTP Method (GET, POST, PUT, DELETE, etc.)
   * Path with support for:
     a. Static segments ("/users", "/api")
     b. Dynamic segments ("/:id", "/:username")
     c. Regular expressions for precise matching
   * Handler (function to execute when the route is matched)

2. Route Grouping
   * Ability to define a common prefix for a set of routes
   * Application of common middlewares to a group of routes
   * Support for hierarchization (sub-routers)

3. Route Parameters
   * Capture of dynamic path segments
   * Definition of constraints on parameters (e.g., id must be numeric)
   * Optional parameters

4. HTTP Method Management
   * Definition of routes for specific methods
   * Support for routes responding to multiple methods
   * Default method (e.g., GET if not specified)

5. Content Negotiation
   * Definition of routes based on Accept headers
   * Ability to specify multiple content types for the same route
   * Content type should be optional

6. Middlewares
   * Definition of intermediate functions to execute before/after the main handler
   * Ability to apply middlewares globally, by group, or by route

7. Error Handling and Default Cases
   * Definition of handlers for 404 (Not Found), 500 (Internal Server Error), and 405 (Method Not Allowed) errors
   * Default route (fallback) if no other route matches

8. Priorities and Matching Order
   * By default, static routes have priority over dynamic routes
   * Ability to manually define the priority of a route (optional)

Segment definition grammar:

segment = static_segment | dynamic_segment ;
static_segment = ["::", ] { character - ":" } ;
dynamic_segment = ":", name, [":", match] ;
name = (letter | "_"), { letter | digit | "_" } ;
match = ("~", regexp) | ("%", wildcard) | type_match | enum_match | date_match | time_match | hex_match | range_match ;
type_match = "int32" | "uint32" | "int64" | "uint64" | "bool" | "b64" | "b64i" | "uuid" | "email" | "slug" ;
hex_match = ("hex" | "hexl" | "hexu"), ["(", length, ")"] ;
regexp = ? PCRE compatible regular expression ? ;
wildcard = ? simplified wildcard pattern, e.g., "*" for any characters, "?" for single character ? ;
length = digit - "0", { digit } ;
enum_match = "enum", "(", word_list, ")" ;
word_list = word, { ",", word } ;
word = (letter | digit | "-" | "_"), { letter | digit | "-" | "_" } ;
date_match = "date", ["(", date_format, ")"] ;
time_match = "time", ["(", time_format, ")"] ;
range_match = "range", "(", range_list, ")" ;
range_list = range, { ",", range } ;
range = single_value | open_range | closed_range ;
single_value = number ;
open_range = (number, "-") | ("-", number) ;
closed_range = number, "-", number ;
number = digit, { digit }, [".", { digit }] ;
letter = "A" | "B" | "C" | ... | "Z" | "a" | "b" | "c" | ... | "z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
character = ? any Unicode character ? ;
date_format = ? date format string ? ;
time_format = ? time format string ? ;

Where:

~ : regular expression that must match the entire string
% : wildcard that must match the entire string
hex : case-insensitive hexadecimal with or without length
hexl : lower case hexadecimal with or without length
hexu : upper case hexadecimal with or without length
int32, int64 : signed integer of the specified size
uint32, uint64 : unsigned integer of the specified size
bool : true or false
b64 : valid base64 [A-Za-z0-9+/=]
b64i : internet base64 [A-Za-z0-9_-]

Examples of path segments:
(as provided in the original text)
*/

// Path structure
struct Path {
    // Implementation details
};

// Function to create a path
struct Path* path_create();

// Generic and fast function to create a prefix
// Can return an error in the prefix definition
struct PathError* path_add_prefix(struct Path* path, const char* prefix);

// Function to add a static segment
// Can return an error if the segment contains forbidden characters
struct PathError* path_add_static(struct Path* path, const char* value);

// Function to add a simple variable segment
void path_add_variable(struct Path* path, const char* name);

// Function to add a variable segment with validation
void path_add_variable_with_validator(struct Path* path, const char* name, bool (*validator)(const char* value, char** error_message));

// Function to add a segment with regular expression
struct PathError* path_add_regex(struct Path* path, const char* name, const char* pattern);

// Basic functions
struct PathError* path_add_static(struct Path* path, const char* segment);
struct PathError* path_add_dynamic(struct Path* path, const char* name);

// Base types
struct PathError* path_add_int32(struct Path* path, const char* name);
struct PathError* path_add_uint32(struct Path* path, const char* name);
struct PathError* path_add_int64(struct Path* path, const char* name);
struct PathError* path_add_uint64(struct Path* path, const char* name);
struct PathError* path_add_bool(struct Path* path, const char* name);

// Special types
struct PathError* path_add_uuid(struct Path* path, const char* name);
struct PathError* path_add_email(struct Path* path, const char* name);
struct PathError* path_add_slug(struct Path* path, const char* name);

// Base64
struct PathError* path_add_base64(struct Path* path, const char* name);
struct PathError* path_add_base64_url(struct Path* path, const char* name);

// Hexadecimal
struct PathError* path_add_hex(struct Path* path, const char* name, int length);
struct PathError* path_add_hexl(struct Path* path, const char* name, int length);
struct PathError* path_add_hexu(struct Path* path, const char* name, int length);

// Regex and Wildcard
struct PathError* path_add_regex(struct Path* path, const char* name, const char* pattern);
struct PathError* path_add_wildcard(struct Path* path, const char* name, const char* pattern);

// Enum
struct PathError* path_add_enum(struct Path* path, const char* name, const char** values, int count);

// Date and Time
struct PathError* path_add_date(struct Path* path, const char* name, const char* format);
struct PathError* path_add_time(struct Path* path, const char* name, const char* format);

// Range
struct PathError* path_add_range(struct Path* path, const char* name, const struct RangeSegment* ranges, int count);

// Utility function to create a path from a string
struct Path* path_from_string(const char* prefix);

// Function to free the memory of a path
void path_free(struct Path* path);

// HTTP Methods
struct HttpMethods {
    const char **methods;
    int count;
};

// Macro for creating HttpMethods
#define METH(...) (struct HttpMethods){(const char*[]){__VA_ARGS__}, sizeof((const char*[]){__VA_ARGS__})/sizeof(const char*)}

// Function to create HttpMethods
struct HttpMethods* method_create();

// Function to create and pre-fill HttpMethods
struct HttpMethods* method_fill(int nb, ...);

// Function to add a method to HttpMethods
struct Error* method_add(struct HttpMethods* methods, const char* method);

// Content Types
struct ContentTypes {
    const char **types;
    int count;
};

struct ContentTypePattern;

// Structure to represent a list of Content-Type patterns
struct ContentTypePatternList {
    struct ContentTypePattern** patterns;
    int count;
};

// Creates a Content-Type pattern from a string
struct ContentTypePattern* content_type_pattern_create(const char* pattern);

// Creates a list of Content-Type patterns from a string
struct ContentTypePatternList* content_type_pattern_list_create(const char* patterns);

// Router structure
struct Router {
    // Implementation details
};

// Router error structure
struct RouterError {
    // Implementation details
};

// Router creation and destruction
struct Router* router_create();
void router_free(struct Router* router);

// Default configuration functions for a router
struct RouterError* router_set_default_method(struct Router* router, struct HttpMethods methods, HandlerFunc handler);
void router_set_error_handler(struct Router* router, int error_code, ErrorHandlerFunc handler);
void router_set_fallback_handler(struct Router* router, HandlerFunc handler);
void router_add_middleware(struct Router* router, MiddlewareFunc middleware);

// Adding routes
struct RouterError* router_add_route(struct Router* router, struct HttpMethods methods, const char* prefix, HandlerFunc handler);
struct RouterError* router_add_route_path(struct Router* router, struct HttpMethods methods, struct Path* path, HandlerFunc handler);

// Adding sub-routers
struct RouterError* router_add_subrouter(struct Router* parent, struct Router* child, const char* prefix);
struct RouterError* router_add_subrouter_path(struct Router* parent, struct Router* child, struct Path* path);

// Content management
struct RouterError* router_add_content_route(struct Router* router, struct HttpMethods methods, const char* prefix, const char* content_type_patterns, HandlerFunc handler);
struct RouterError* router_add_content_prefix(struct Router* router, struct HttpMethods methods, struct Path* path, const char* content_type_patterns, HandlerFunc handler);

// Matching and dispatching
struct MatchResult* router_match(struct Router* router, const char* method, const char* path, const char* content_type);
void router_dispatch(struct Router* router, struct MatchResult* result, struct Request* request, struct Response* response);

And exemple usage (generated by IA) :

#include <stdio.h>
#include "http_router.h"

// Example handler functions
void get_users(struct Request* req, struct Response* res) {
    // Implementation
}

void create_user(struct Request* req, struct Response* res) {
    // Implementation
}

void get_user_by_id(struct Request* req, struct Response* res) {
    // Implementation
}

void update_user(struct Request* req, struct Response* res) {
    // Implementation
}

void delete_user(struct Request* req, struct Response* res) {
    // Implementation
}

void get_article(struct Request* req, struct Response* res) {
    // Implementation
}

void api_fallback(struct Request* req, struct Response* res) {
    // Implementation for handling unmatched routes
}

int main() {
    struct Router* router = router_create();
    if (!router) {
        fprintf(stderr, "Failed to create router\n");
        return 1;
    }

    struct RouterError* err;

    // Example 1: Simple static route
    err = router_add_route(router, METH("GET"), "/hello", hello_world);
    if (err) {
        fprintf(stderr, "Failed to add route: %s\n", err->message);
        // Handle error
    }

    // Example 2: Route with dynamic segment using concise syntax
    err = router_add_route(router, METH("GET"), "/users/:id:uint64", get_user_by_id);
    if (err) {
        // Handle error
    }

    // Example 3: Multiple HTTP methods for the same route with content negotiation
    err = router_add_content_route(router, METH("GET", "POST"), "/users", "application/*", NULL);
    if (err) {
        // Handle error
    }

    err = router_add_content_route(router, METH("GET"), "/users", "application/json", get_users);
    if (err) {
        // Handle error
    }

    err = router_add_content_route(router, METH("POST"), "/users", "application/json", create_user);
    if (err) {
        // Handle error
    }

    // Example 4: Route with multiple dynamic segments and type constraints
    err = router_add_route(router, METH("GET"), "/blog/:year:int32/:month:range(1-12)/:slug:slug", get_article);
    if (err) {
        // Handle error
    }

    // Example 5: Route with hexadecimal constraint
    err = router_add_route(router, METH("GET"), "/resources/:id:hex(32)/read", get_resource);
    if (err) {
        // Handle error
    }

    // Example 6: Sub-router for API versioning
    struct Router* api_v1 = router_create();
    struct Router* api_v2 = router_create();

    // Add routes to api_v1
    router_add_route(api_v1, METH("GET"), "/users", get_users_v1);
    router_add_route(api_v1, METH("POST"), "/users", create_user_v1);

    // Add routes to api_v2
    router_add_route(api_v2, METH("GET"), "/users", get_users_v2);
    router_add_route(api_v2, METH("POST"), "/users", create_user_v2);

    // Add sub-routers to main router
    router_add_subrouter(router, api_v1, "/api/v1");
    router_add_subrouter(router, api_v2, "/api/v2");

    // Example 7: Setting a fallback handler for unmatched routes
    router_set_fallback_handler(router, api_fallback);

    // Example 8: Adding middleware to the router
    router_add_middleware(router, log_request);
    router_add_middleware(router, authenticate_user);

    // Example 9: Route with enum constraint
    err = router_add_route(router, METH("GET"), "/items/:category:enum(electronics,books,clothing)", get_items_by_category);
    if (err) {
        // Handle error
    }

    // Example 10: Route with wildcard
    err = router_add_route(router, METH("GET"), "/files/:path:%*", serve_file);
    if (err) {
        // Handle error
    }

    // Clean up
    router_free(api_v1);
    router_free(api_v2);
    router_free(router);

    return 0;
}

Are you interested by this job ? can I continue the brainstorm ?

boazsegev commented 1 month ago

Hi,

Thank you so much for both your interest in contributing and detailed suggestion.

That's a lot of beautiful work that you did and a lot for me to go over.

My biggest things to note:

  1. Development Repo Updates
  2. Dependencies
  3. API design
  4. Naming Conventions
  5. Argument Types

Development Repo Updates

Please allow me to point out that edge development for the facil.io functionality, including the HTTP layer, is now performed in the repo: https://github.com/facil-io/cstl

The new repo represents the transition to version 0.8.x.

Dependencies

The project avoids external dependencies when possible, especially dependencies that might raise licensing issues.

For this reason RegEx cannot be accepted (unless we roll our own and the RegEx Specification License allows us to do so while licensing our code under MIT / ISC).

For now, I suggest that we limit our URL section parser to glob matching that is already implemented, or perhaps rolling our own slightly expended approach.

By the way, does :year:int32 follow a convention of some type? Is it something already seen in the wild?

API

User experience is highly values in the facil.io approach. Everything should be easy.

For this reason, I would like to minimize the number of possible functions and provide named arguments for functions with more than 3 or 4 arguments.

Basically I hope to have as little as a single function that controls the router:

typedef struct fio_http_router_s fio_http_router_s;
/**
* Creates a route to URL and returns a pointer to a router object starting at that URL (sub-router).
*
* The sub-router allows calls to `fio_http_router_map` to treat `url` as the root of any new routes.
*
* The sub-router may be discarded or ignored, but it MUST NOT be freed manually (as it belongs to the main router object).
*/
fio_http_router_s *  fio_http_router_map(fio_http_router_s *, const char * url, fio_http_router_options_s opt);

/** named arguments helper for the `fio_http_router_map` function. */
#define fio_http_router_map(router, url, ...) fio_http_router_map(router, url, (fio_http_router_options_s){__VA_ARGS__})

/* and, of course, init/destruct functions: */
fio_http_router_s * fio_http_router_new(void);
void fio_http_router_free(fio_http_router_s *);
void fio_http_router_destroy(fio_http_router_s *);
#define FIO_ROUTER_INIT {0}

The options passed should include callbacks for GET, POST etc', as well as a static file callback and a catch-all callback (in case of non-standard methods).

The sub_router you mention should be detected automatically by the fio_http_router_map function as it breaks down the URL to sections and creates sub-routers for each section

As for the content-type property as a router controller – I think that's a super interesting idea, but in general I haven't seen much use for it in the wild. If this is a use-case you encountered, we can definitely add it while keeping the NULL content-type as a catch-all.

Naming Conventions

Please note that all names in facil.io use snake_case and types have a designated suffix that depends on what they refer to (i.e., _s for struct, _u for union, _e for enum).

We avoid the struct and enum keywords where possible by using typedef, The suffix should be enough to indicate if this is a struct, a union, an enum, or anything else.

To maintain name space integrity and avoid the risk of name collisions (especially with possible user code), the names of types and functions should follow the <library>_<module>_<function/type> convention.

The library is always fio for facil.io.

So, in this case, I would expect struct Router to become a typedef with the name fio_http_router_s.

Argument Types

The moment we accept dynamic URL segments, we need to discuss the types we will use to store this data and how it would be made available.

I suggest that we adopt the FIOBJ type system, but if you have a better idea, I'm all ears.


Again, thank you so much and I hope we can make this work :)

thierry-f-78 commented 1 month ago

Please allow me to point out that edge development for the facil.io functionality, including the HTTP layer, is now performed in the repo: https://github.com/facil-io/cstl The new repo represents the transition to version 0.8.x.

Okay, no problem.

For this reason RegEx cannot be accepted (unless we roll our own and the RegEx Specification License allows us to do so while licensing our code under MIT / ISC).

I'm thinking of PCRE regex, but I'm not aware of the license issues. I don't think including a third-party library from the system (without providing the source) causes license problems. this library is not embedded in the source code, its just a compilation dependancy. (look in haproxy, embedding PCRE it juste changing compilation line https://github.com/haproxy/haproxy/blob/8427c5b5421a93ee29170fb6ca3093478acd7ab7/Makefile#L770) Note that "libc" regex exists (https://www.gnu.org/software/libc/manual/html_node/Regular-Expressions.html). These regex functions are slower than PCRE, but they exist, are provided with libc, and are POSIX specified. The compilation with one library or the other could be defined at compilation time by a define.

By the way, does :year:int32 follow a convention of some type? Is it something already seen in the wild?

You can find the : notation in some frameworks:

We chose the : character because it's a reserved character and it's forbidden in URLs. Note that : can be used in a URL if it's URL-encoded like %3A. This case is processed using :: to designate : at the start and handle uri like GET /%3Aaction. The extension : <type-description> is personal brainstorming.

There are other examples of routers:

User experience is highly values in the facil.io approach. Everything should be easy.

I absolutely agree. And a good documentation with many examples will be welcome. The minimum code for using my proposition (disregarding naming, which will be changed later) is:

   router = router_create();
   err = router_add_route(router, METH("GET", "POST"), "/resources/:id:hex(32)/read", get_resource);

This contains the minimum information to design a simple project. Other functions are used for advanced usage. Look at the example at the end of my previous post.

Perhaps the functions exposed in the API should be separated into two parts. One part that presents simple and concise functions that allow handling most simple projects, and another part that presents advanced functions that also allow handling complex projects.

The sub_router you mention should be detected automatically by the fio_http_router_map function as it breaks down the URL to sections and creates sub-routers for each section

Sub-routers are strongly used to configure complex applications with inheritance, like this:

The main router splits the application into 4 parts, processed by 4 teams:

main.c:

void main() {
   router *v1;
   router *v2;
   router *app
   router *static
   router *main

   v1 = router_create();
   v2 = router_create();
   app = router_create();
   static = router_create();

   main = router_create();
   router_add_route(router, METH("GET"), "/heath-check, get_health_check);
   router_add_subrouter(router, v1, "/v1")
   router_add_subrouter(router, v2, "/v2")
   router_add_subrouter(router, app, "/app")
   router_add_subrouter(router, static, "/static")

   init_v1(v1);
   init_v1(v2);
   init_v1(app);
   init_v1(static);
}

v1.c:
void init_v1(r *router) {
   router_add_route(r, METH("GET"), "/users, get_users);
}

v2.c:
void init_v2(r *router) {
   router_add_route(r, METH("GET"), "/users, get_users);
}

app.c:
void init_app(r *router) {
   router_add_route(r, METH("GET"), "/login, get_login);
   router_add_route(r, METH("POST"), "/login, post_login);
}

static.c
void init_static(r *router) {
   router_add_route(r, METH("GET"), "/, get_static_content);
}

Note, I've already seen this case at a client's (a very large publicly traded company). Each team handles a part of the application, and the routing was done by the application server (old C technology). Its current usage with other framework of other languages.

As for the content-type property as a router controller – I think that's a super interesting idea, but in general I haven't seen much use for it in the wild.

Content-type based routing is necessary for compatible APIs. POST with JSON is processed by a specific handler, while POST with form-data is processed by another one.

If this is a use-case you encountered, we can definitely add it while keeping the NULL content-type as a catch-all.

NULL content-type is handled. Alternative functions without the content-type parameter could be exposed (but this would be one more function).

Naming Conventions … So, in this case, I would expect struct Router to become a typedef with the name fio_http_router_s.

Okay, no problem. It's a detail to rename all of this.

Dès que nous acceptons des segments d'URL dynamiques, nous devons discuter des types que nous utiliserons pour stocker ces données et de la manière dont elles seront rendues disponibles.

Absolutely, for me, the most important thing is defining an API, and then types.

Obviously, types used by the API are different from runtime types, which must be optimized for speed.

All the router/subrouter must be solved as unique descriptor in a sort of compilation phase.

Quickly, I'm thinking of some TREEs with one root per METHOD. Each tree node contains:

boazsegev commented 1 month ago

Closing, discussion moved to: https://github.com/facil-io/cstl/issues/26