Add epoch timestamps to responses in NDJSON output format (app start only, rx only, or rx+tx)

mzpqnxow commented 3 years ago

I have a workflow where I often have batches of massdns NDJSON files and batches of data from various other tools. The challenge I'm having right now is ensuring that the most recent DNS responses are used when merging/joining these datasets when there are duplicate questions with different answers- due to a change in the zone file that occurred some time between the two sessions that generated the batches. In some cases, massdns NDJSON files have been concatenated, or don't have accurate mtime/ctime. So I'm kind of helpless there

The simple solution to this is in line with what other similar tools do- stamp an epoch timestamp into the record

For my purposes, it would be a huge help even if it was just a fixed time that was retrieved once and stamped into every NDJSON row, grabbed with time() at startup

The most correct and/or powerful/broadly useful implementation would probably be to have both a tx and rx stamp, as two logically separate fields. Something like:

{
  "name": "www.bah.com.",
  "type": "A",
  "class": "IN",
  "status": "NOERROR",
  "tx_ts": 1614783271,
  "rx_ts": 1614783273,
  "data": {
    "answers": [
      {
        "ttl": 300,
        "type": "A",
        "class": "IN",
        "name": www.bah.com,
        "data": "1.2.3.4"
      }
    ],
  },
  "resolver": "1.2.3.4:53"
}

I realize that capturing the transmit stamp may require thoughtfulness to avoid impacting the transmission rate, and I also need to re-familiarize myself with the massdns code before saying this will be simple. But I put a little bit of thought into it as I was writing this up.

A naive implementation would be continuously calling time(), which would result in a continuous flood of system calls. Not really a good solution and may have a significant performance impact. That's not what I would like to do.

I'm guessing that the best solution would be one that avoids system calls/interrupts entirely, either by directly making use of architecture-specific instructions (e.g. rdtsc for x86/x86_64) or a libc wrapper for the same. I think (need to check) that some or all of the "clock" types in glibc clock_gettime() utilize architecture-specific instructions instead of system calls. I think most (if not all) are accessible from userspace. FWIW, I have access to ARM and PPC64 machines to check the portability of whatever seems to be a good solution, to avoid breaking people on weird architectures

I don't expect any CPU instructions to return epoch time, but if epoch time is captured once at the start of the application, the ticks can just scaled (from nanoseconds, milliseconds, or whatever) to seconds and then subtracted from that initial epoch value without any measurable impact to performance

I think the choices are (in order of least effort, least invasiveness to most):

This is a dumb idea. Do nothing
Add a single static timestamp to every NDJSON row, representing the time the process started
Add a dynamically retrieved timestamp to every NDJSON row, generated at the time the response is received
Add both a transmit and receive timestamp to every NDJSON row, generated before transmit and after receive

I think #4 is doable. #2 solves my problem "well enough". I'm happy to take a shot at any of them.

Thoughts on this?

As always, I appreciate the time you put into this, it's a very useful tool for quite a few workflows I have

(BTW- I'll feel very silly if this data is already captured and just not stamped into the row- I just didn't have time to look through the source before entering this)

mzpqnxow commented 3 years ago

I think I forgot to mention (oops) I'm happy to make a PR for this, assuming you think it's an acceptable addition

blechschmidt commented 2 years ago

5cacad77c48e77a2dd93821bc42d124f590b7f42 adds RX timestamps. Performance should not be much of an issue as the timestamp is provided by the vDSO and thus requires no syscall. There was a time(NULL) call anyway which has now been replaced by clock_gettime(CLOCK_REALTIME, ...). Comparing the instruction count, obtaining the time with clock_gettime now takes 94 vs. previously 24 instructions. I think this is quite acceptable.

As transaction IDs are reused on purpose, one problem is that massdns cannot not keep track of which transmission a reply belongs to (at least as soon as a follow-up query has been sent). So a TX timestamp would be of limited use anyway and I wouldn't include it unless there is a demand for it.

mzpqnxow commented 2 years ago

Great, thank you for this!

blechschmidt / massdns

Add epoch timestamps to responses in NDJSON output format (app start only, rx only, or rx+tx) #111