CESNET / ipfixcol2

High-performance NetFlow v5/v9 and IPFIX collector (RFC7011)
Other
128 stars 38 forks source link

Set core affinity to processor threads #75

Open rasaoskuei opened 2 years ago

rasaoskuei commented 2 years ago

In order to reduce OS context switch overhead, it's good to set core affinity to the processor thread (the user can isolate the specific core from the OS scheduler and use it) to fix the core that will execute the processor function.

Lukas955 commented 2 years ago

Hi,

thank you for suggestion. Do you have a preferred way of setting application affinity? It can be set after the start of the application, e.g. using taskset command. Alternatively, we could add a parameter directly specifying the mask where the application threads can run e.g.: ipfixcol2 -a mask ....

Is this what you have in mind?

Lukas

rasaoskuei commented 2 years ago

Hi Lukas, thanks for your attention

The main point is about the main processor thread. Suppose you have a thread that processes the IPFIX messages and also you have some other signaling threads maybe. Only the affinity of the processor thread is important, if I do this with taskset then we give n CPU cores to all the threads and all threads (specifically the processor thread) will be switched between n cores.

In my opinion, pthread_setaffinity_np from UNIX pthread is the proper API to set the core affinity, because you just set a specific thread core. Also, we can isolate some CPU cores from the OS scheduler and set them to some processors. What do you think about this?

Lukas955 commented 2 years ago

Hi,

The collector does not have a single process thread. In fact, it is quite parallel. If you look at the picture in the documentation, each block (i.e. plugin instance) represents a single thread. Moreover, there are other "hidden" processing threads - a packet parser thread after each input plugin instance and one output manager thread between the last intermediate plugin and output plugins. In other words, there are at least 4 threads in case of a minimal configuration (one input + one output plugin instance).

What do you think is the best practice in this case?

Lukas

rasaoskuei commented 1 year ago

Hi, thanks for your description.

My purpose is to terminate the OS context switch (which destroys the CPU cache line and ...). My suggestion is you get a core index in each part of the configuration (optionally) and set the processor thread core affinity. For example, if I used TCP input and JSON Kafka output, I can set <core>CORE_INDEX</core> inside both input and output configurations (this configuration is optional).