What Features do you want to see in fail.io?

boazsegev commented 6 years ago

I'm in the processes of rewriting many part of the library, including major updates to the HTTP and Pub/Sub design and API, with the intention of making everything as easy to use as can be.

You can have a look at the reHTTP branch to see the work so far.

But now is your chance to vote or suggest any new features you want to see in facil.io.

What features would you prefer (HTTP client? how big should the total HTTP headers length size should be? etc')

What default behaviors would you prefer? (Do you want forked processes to re spawn automatically after a crash or do you prefer a cild process crashing to bring down the while process cluster node?)

Leave comments below.

64 commented 6 years ago

Have you considered adding continuous integration testing?

boazsegev commented 6 years ago

@64 ,

I think that's a great idea 👍🏻

I was staying away from CI because I didn't find a good way to automate the testing (sure, I have test code in place for some things, but I didn't figure out how to write an automated test for the evented network layer and the many interactions possible).

On the other hand, just testing for build errors on different environments would be great (probably save me some of the vagrant juggling I perform when I test each release).

Yap, I think that's definitely something to add to the TODO list 👍🏻

64 commented 6 years ago

As for TLS support, you should consider using a library called s2n. It’s relatively new and not as feature rich as OpenSSL, but the API is far friendlier and there is a big focus on security.

boazsegev commented 6 years ago

I'll read more about s2n as soon as I can.

Right now I'm trying to design an API that will easily work with different underlying TLS libraries... but the differences in API and the variety in features are still a bit much for me bridge.

As soon as I manage a simple design for the API, I will start working on an implementation.

filly86 commented 6 years ago

Since everyone ist now having multicore machines I would love if you could elaborate on how to use your library best to create a multithreaded web socket chat Server. Which would be the best Option and why?

create one eventloop and spawn threads to handle the parsing of the websocket frames?
create one eventloop per thread

If Option 2: how to exchange data between the eventloop threads? Using message passing with pipes? Or using shared memory?

Would be really great if you could show an example regarding this topic with your library?

boazsegev commented 6 years ago

@filly86 ,

Yes, you are right, I should definitely add proper documentation for this.

Thank you for pointing this out.

The library handles multi-threading and clustering for you, using a single event-loop per process and supporting cluster mode (one master process + workers).

In order to provide more optimization approaches, facil.io offers both worker processes and thread pools (see facil_run and it's arguments).

Your option 2 (one eventloop per thread) is very similar to the design of a worker process and it raises the same issues. For this reason, I chose to have threads behave differently than processes (threads share the event loop, allowing the application to protect itself and mitigate any slow/blocking code).

There should be no need to dynamically spawn threads when using facil.io (the thread pool, evented design and existing API should cover everything you need).

As for an example, there should be one in the examples folder... but of course you are right, I should add all this to the documentation and improve the way the example is authored to make it more readable.

filly86 commented 6 years ago

Your lib seems to be really really interesting, now that I am seeing that its even handling multithreading. Keep up the good work. My suggestion would also be to not overblow the lib with thousands of features. Nice, clean and simple is I think important, but its your decision of corse.

Let me ask you a few more questions regarding multithreading. Did I understand it correctly that you are using one Eventloop per Process? If yes:

How many processes are you spawning? Are you spawning as many as CPU cores available one the machine?
Which mechanism are you using for inter process communication, how are you sending messages from the event loop in process 1 to a second event loop in process 2? Are you simply sending via TCP Sockets?

You are also saying that you are using threads which share one single event loop. What are these threads doing exactly? Are they parsing the incoming websocket frames? I am just thinking that in case of websocket contination frames you will need some kind of synchronisation? Lets assume following scenario:

Client sends a websocket frame splitted in two continuation frames.
First Frame is sent to the server. Your lib spawns a thread (?) recognizes that its the start of a continuation frame and waits for the second part of the frame to arrive before starting parsing.
But in the meantime the second frame was also sent to the server but it was handled by a different thread!

What will happen in this case? Will the first thread recognize that a second thread already received the second part of the frame? Or will the second thread recognize that the first thread handled the first part of the frame?

boazsegev commented 6 years ago

@filly86 ,

Thanks ☺️🙏🏻

My suggestion would also be to not overblow the lib with thousands of features.

I agree, but it's so hard...

The vision is closer to a framework library than just a platform. The library should take care of all the transport layer details, leaving the developer to focus on their application's logic.

How many processes are you spawning? Are you spawning as many as CPU cores available one the machine?

It's entirely up to the user to decide.

The automatic mode (if the user didn't specify anything) will assume the user didn't read the documentation and invokes a defensive strategy with an upper limit. This will usually result in cpu_cores X workers with cpu_cores X threads per worker (it's defensive against blocking IO in the event loop).

Otherwise, to each their own. Lately I've been running my applications with -4 threads (negative values compute as fractions, so it's cpu_cores/4 threads per process) and 3 processes (leaving room for the kernel).

Which mechanism are you using for inter process communication...?

I use Unix sockets for IPC. They use the facil_cluster_send in the facil.io cluster API.

The pub/sub extension uses this API to send messages across process boundaries.

You are also saying that you are using threads which share one single event loop. What are these threads doing exactly? Are they parsing the incoming websocket frames?

The event loop is protocol agnostic. There's only a number of simple rules:

the event loop prevents the same event (or event type) from running concurrently for the same connection.
the event loop prevents the connection's protocol_s object from being destroyed while it's still in use (best attempt, I can't control user code).

The event loop doesn't know or care what the protocol object does with the on_data event (or any event, for that matter).

I am just thinking that in case of websocket contination frames you will need some kind of synchronisation?

Contination frames are handled by the Websocket protocol object and the Websocket parser.

The Websocket protocol object manages an internal buffer for incomplete packet data (both incomplete Websocket frames and incomplete Websocket messages).

No new threads are spawned and the event loop protects the on_data callback from running concurrently, so it's very easy to handle.

First Frame is sent to the server. Your lib spawns a thread (?)...

I feel a need to make this very clear - the library would never spawn threads in response to client or network events.

The reason is simple, spawning threads in response to network events is a security issue.

When using a thread per connection, there's a risk that DoS attacks could overload the server's context switching algorithm and bring a system to it's knees.

There's a reason nginx (evented, no threads being dynamically spawned) can handle heavier loads than Apache (historically, a thread per client design, but I didn't check in a while and I may be wrong).

filly86 commented 6 years ago

I really appreciate the time you are spending to answer these questions. I know that spawning threads for each connection is not what your are doing. Thats why you are using epoll/kqueue.

But I am still not understanding the following sentence:

"For this reason, I chose to have threads behave differently than processes (threads share the event loop, allowing the application to protect itself and mitigate any slow/blocking code)."

Can you give an example in which situations are you using more then one thread to prevent slow/blocking code? What is so time consuming that you need extra thread for it?

the event loop prevents the same event (or event type) from running concurrently for the same connection.

How are you achieving this? Mutex?

Btw: I am the guy who reported this bug: https://github.com/boazsegev/facil.io/issues/6

As a backgroud: 2 years ago I was also working on a multithreaded websocket Chat Server in plain C using epoll. The goal was: Best performance, no dependencies, simple and clean, only support for text frames. First I tried with one epoll loop shared by multiple threads but got problems which are also described here. So I switched to one epoll loop per process which worked much better. But due to limited C knowledge I never got it working 100% stable. Under high load and with usage of continuation frames (from Chrome) it was crashing.

You lib is very interesting, it is growing fast (so its not so small like a few months before unfortunately ;)) and I will for sure do some stress tests during the coming days just to see how it behaves and I hope to learn something new.

boazsegev commented 6 years ago

@filly86 ,

Thank you for your interest. I'm happy to answer your questions.

However, this thread is occupied with other people as well. This isn't a questioning thread, so if you have further questions, please open an issue.

Can you give an example in which situations are you using more then one thread to prevent slow/blocking code? What is so time consuming that you need extra thread for it?

A common example of this issue is CGI style applications or the Ruby iodine server (which uses this library). In these instances the user's code might initiating a Database request and wait for the database to respond using a blocking IO call instead of the evented API (in CGI style applications, the evented API might be unavailable).

By adding another thread (and having the threads share the event queue / event loop), the next available thread will simply deal with the remaining events while the original thread is blocking.

Starting facil.io with a larger thread pool will mitigate a larger number of blocking threads.

How are you achieving this? Mutex?

This is actually a very good question. Many different solutions exist.

I found the best performance was achieved by using exponential micro-sleep patterns.

Btw: I am the guy who reported this bug: #6

Thanks! 🙏🏻

You lib is very interesting... I will for sure do some stress tests during the coming days just to see how it behaves and I hope to learn something new.

FYI:

on macOS, stress testing is limited due to the open-file limit.

It seems that the kqueue implementation breaks when trying to override the kernel's default limit.

nginx breaks at the same point, it's a kernel issue.

The breaking point seems to be around 10,124 open files (I don't know why that is and I'm not sure if it isn't something on my machine).
On Linux things seem to run a bit smoother.

Please keep me posted (in a new issue).

Kindly, B.

boazsegev / facil.io

What Features do you want to see in fail.io? #24