RafaGago / mini-async-log

Non bloated asynchronous logger
Other
220 stars 20 forks source link

timestamping options #13

Closed funkyshoc closed 7 years ago

funkyshoc commented 7 years ago

Do you have a "user friendly" timestamp option?

Example:

2017-04-15 09:16:00.105972 Test my data

Vs:

00000000003.321687819 Test my data

Also - if you have one producer that wants the timestamp printed and another that does not on two different files -- do you have to start two threads as it stands or is there still a single thread but multiple log files/multiple configurations option?

RafaGago commented 7 years ago

The timestamps are based on the monotonic clock, not on the calendar clock, so they are always relative to the boot time (as kernel messages/dmesg). This is to deal with DST and wall clock changes.

Each file name has the uint64 timestamp (in hex) of both the monotonic clock and the calendar clock at file time creation. So the timestamps can be converted to calendar time by a script.

As things are now, you can't have different producers logging on different files and using the same backend instance. Each instance has its own consumer thread too, even though I think that I will add the feature to run run the consumer loop without "mal" owning the thread. Owning the thread was a bad design decission (that everyone does though), there is no need for the logging framework to own the consumer (file writer) thread.

https://github.com/RafaGago/mini-async-log/issues/7

Note that if you prefix each logged string with a producer thread name/id you can solve the timestamp problem by using grep/sed at log reading/archiving time and to generate separate files. This is much better performance-wise than using multiple logger instances. If you care about performance having more logger instances will make your program to "touch" more difference memory areas, increasing the chances of soft page faulting.

funkyshoc commented 7 years ago

I realize there is a timestamp option if you want more accurate timestamps by having the producer take the timestamp. However assuming you do really care more about optimal performance and you don't want to take that "hit", is the consumer at least fairly good most of the time at timestamping within a few microseconds from when the producer queued the log entry, or does it do a lot of heavy lifting before the ultimate log statement that could take many microseconds.

RafaGago commented 7 years ago

There are two sources of imprecision when timestamping on the consumer:

I'd say that the average system will spend most time on the IDLE state and will not have massive data bursts so, I'd expect a 8ms drift.

The exponential backoff parameters are configurable. For the IDLE case you can lower those 8ms by configuring the backoff parameters: https://github.com/RafaGago/mini-async-log/blob/master/include/mal_log/cfg.hpp#L150

"long_sleep_ns" is the parameter that controls that wait. https://github.com/RafaGago/mini-async-log/blob/master/include/mal_log/util/queue_backoff_cfg.hpp#L48

But this means that you will have a consumer polling a potentially empty queue, this has an effect on the performance too. You definitely don't want to burn cycles by polling the queue at microsecond rates.

If you need microsecond precision you have to timestamp at the calling point (producer). Unfortunately getting the timestamp is slower than the logger itself, but there is no better solution.

Remember that this isn't a general-purpose data logger, compromises need to be made to reach high performance. In what kind of application are you planning to use it?

funkyshoc commented 7 years ago

Using it for high performance networking application (need to process data as close to real-time as possible and yet also be able to log entries when the network data coming in at 1 gigabit matches various filters).

I've read about a 70ns logger being possible (see link below), including timestamps and thread ids, but I agree that compromising is fine by me in general and I like the idea of post processing one main log file for different producers and converting timestamps. I'm just trying to get the best I can without spending months writing my own :) I will need to set your mal_cfg.misc.producer_timestamp = true for my purposes since I will need the timestamps within a few microseconds or so (or as you mentioned changing the wait).

The 70ns blog has some interesting ideas I wonder if there is anything from here you can borrow:

https://mortoray.com/2014/05/29/wait-free-queueing-and-ultra-low-latency-logging/

RafaGago commented 7 years ago

Saying 70ns alone is a bit vague IMO. The performance will depend on the load.

This logger already does all the things described in the article and a bit more, e.g. removing trailing zeros from 64-32-16 bit integers.

Keep in mind that this logger's benchark is testing brutal loads: A burst of 100k-1M messages from many threads at the same time it's very unlikely to happen e.g in your application. The benchmark is run in a 7 year old AMD machine too.

On the less stressing load of the benchmark (100k msgs burst - 1 thread) this logger is already reaching 35ns on a 7yo mid-end computer. I bet that it will be less in a modern machine. I'd say that a system logging 100/1000 msgs/sec, from 1 -4 threads will reach these 35ns average latency.

But I highly doubt it that 70ns can be reached with timestamps, at least in my machine and getting the timestamps through the C++ steady_clock. If I remember correctly I measured 40-70ns just to get them, which is slower than the current logger performance itself on the uncontended with warm cache case.

BTW the article has big mistakes, it says that a queue with spinlocks is wait-free: A queue with splinlocks is a blocking queue. This is the approach that nanolog is using. The bounded queue of this logger is blocking too (but not for the average case and it has better blocking behavior than a spinlock).

The only optimization to this logger that I can't think of now is to use Thread local storage and the heap algorithm. In this way the system becomes wait-free for the producers. But as the queue used for the heap is linked list based it may penalize the cache. If the proof of concept of this algorithm is faster than this one I will write the logger, but this time in C without templates (code bloat).

RafaGago commented 7 years ago

cd build/linux make -f Makefile.example.benchmark build/stage/x86_64-linux-gnu/bin/mal-benchmark [loggers]

This will test with 100kmsgs mal-hybrid vs e.g. reckless and print the tables in markdown.

build/stage/x86_64-linux-gnu/bin/mal-benchmark 100000 1 mal-hybrid reckless