0xERR0R / blocky

Fast and lightweight DNS proxy as ad-blocker for local network with many features
https://0xERR0R.github.io/blocky/
Apache License 2.0
4.36k stars 198 forks source link

Feature: Realtime Query Logging without Database #1433

Open starsoccer opened 2 months ago

starsoccer commented 2 months ago

Currently its possible to push metrics from blocky into grafana via Prometheus. And while this works fine, its currently not possible to also have the query log in grafana without making use of another database such as mysql, or postgres.

Edit: It would be nice if there was a way to instead push the query log to influxdb, or alternatively, expose it somehow so it can be grabbed by telegraf and pushed into influx. It would also be nice if all metrics could be pushed into influxdb natively or via telegraf so there isnt a need to run prometheus and/or telegraf and/or grafana.

kwitsch commented 2 months ago

Sounds reasonable.

I even started implementing it a long time ago but didn't finish it since I stopped using InfluxDB... 🤔

starsoccer commented 2 months ago

Awesome, happy to test a branch or help however I can.

I run influxdb(1.8) for a few other things, so Id prefer to avoid having to run another DB just for query logging. It would be great to not need to run prometheus either, but one step at a time 😉

ezekieldas commented 2 months ago

I can test with influxdb2.

kwitsch commented 2 months ago

Ok I looked into it again and it seems that for each InfluxDB version a different client is required. O.o

I'm a bit taken back by this.

Cost vs benefits of implementing 3 clients for 1 database type is quite unreasonable...

starsoccer commented 2 months ago

Ok I looked into it again and it seems that for each InfluxDB version a different client is required. O.o

I'm a bit taken back by this.

Cost vs benefits of implementing 3 clients for 1 database type is quite unreasonable...

That is annoying(re different clients). I dont know a huge amount about Prometheus vs Telegraf, but I know blocky currently supports Prometheus, is there no way to expose this information via Prometheus? And then on the flip side what about doing so via Telegraf?

0xERR0R commented 2 months ago

Maybe we should externalize the query log into a separate service via gRPC? We can define streaming contract and provide implementation for postgres/mariadb/csv. People can create own implementation, since grpc is platform agnostic, it could be done in different language (for example java)?

ezekieldas commented 2 months ago

edit: totally overlooked the request here is for query logging. Leaving this here regarless in case someone searches for metrics.


This would require use of telegraf, but not necessarily on the same host as blocky. This basically amounts to a scraper (via telegraf) with an output to influx.

https://docs.influxdata.com/influxdb/cloud/write-data/developer-tools/scrape-prometheus-metrics/

https://github.com/influxdata/telegraf/blob/release-1.30/plugins/inputs/prometheus/README.md

# cat prom_blocky.conf
[[inputs.prometheus]]
  urls = ["http://blocky.somewhere.io:9110"]
# sudo -u telegraf telegraf --test --config prom_blocky.conf

2024-04-19T19:29:43Z I! Loading config: prom_blocky.conf
2024-04-19T19:29:43Z I! Starting Telegraf 1.30.1 brought to you by InfluxData the makers of InfluxDB
2024-04-19T19:29:43Z I! Available plugins: 233 inputs, 9 aggregators, 31 processors, 24 parsers, 60 outputs, 6 secret-stores
2024-04-19T19:29:43Z I! Loaded inputs: prometheus
2024-04-19T19:29:43Z I! Loaded aggregators:
2024-04-19T19:29:43Z I! Loaded processors:
2024-04-19T19:29:43Z I! Loaded secretstores:
2024-04-19T19:29:43Z W! Outputs are not used in testing mode!
2024-04-19T19:29:43Z I! Tags enabled: host=sfo001
2024-04-19T19:29:43Z I! [inputs.prometheus] Using the label selector:  and field selector:
> blocky_blacklist_cache,group=ads,host=florax,url=http://blocky.somewhere.io:9110/metrics gauge=183177 1713554984000000000
> blocky_blocking_enabled,host=florax,url=http://blocky.somewhere.io:9110/metrics gauge=1 1713554984000000000
> blocky_build_info,build_time=20240106-205224,host=florax,url=http://blocky.somewhere.io:9110/metrics,version=v0.23 gauge=1 1713554984000000000
[ ... ]
starsoccer commented 2 months ago

Maybe we should externalize the query log into a separate service via gRPC? We can define streaming contract and provide implementation for postgres/mariadb/csv. People can create own implementation, since grpc is platform agnostic, it could be done in different language (for example java)?

I think doing something like that may be a bit overkill honestly. Honestly from my POV Id be happy with just a log file that contains all the requests in close to/near real time. The issue for me is really that Id like to avoid running a database, but using csv for an output means the data will be stale by at most 1 day.

starsoccer commented 2 months ago

This would require use of telegraf, but not necessarily on the same host as blocky. This basically amounts to a scraper (via telegraf) with an output to influx.

https://docs.influxdata.com/influxdb/cloud/write-data/developer-tools/scrape-prometheus-metrics/

https://github.com/influxdata/telegraf/blob/release-1.30/plugins/inputs/prometheus/README.md

# cat prom_blocky.conf
[[inputs.prometheus]]
  urls = ["http://blocky.somewhere.io:9110"]
# sudo -u telegraf telegraf --test --config prom_blocky.conf

2024-04-19T19:29:43Z I! Loading config: prom_blocky.conf
2024-04-19T19:29:43Z I! Starting Telegraf 1.30.1 brought to you by InfluxData the makers of InfluxDB
2024-04-19T19:29:43Z I! Available plugins: 233 inputs, 9 aggregators, 31 processors, 24 parsers, 60 outputs, 6 secret-stores
2024-04-19T19:29:43Z I! Loaded inputs: prometheus
2024-04-19T19:29:43Z I! Loaded aggregators:
2024-04-19T19:29:43Z I! Loaded processors:
2024-04-19T19:29:43Z I! Loaded secretstores:
2024-04-19T19:29:43Z W! Outputs are not used in testing mode!
2024-04-19T19:29:43Z I! Tags enabled: host=sfo001
2024-04-19T19:29:43Z I! [inputs.prometheus] Using the label selector:  and field selector:
> blocky_blacklist_cache,group=ads,host=florax,url=http://blocky.somewhere.io:9110/metrics gauge=183177 1713554984000000000
> blocky_blocking_enabled,host=florax,url=http://blocky.somewhere.io:9110/metrics gauge=1 1713554984000000000
> blocky_build_info,build_time=20240106-205224,host=florax,url=http://blocky.somewhere.io:9110/metrics,version=v0.23 gauge=1 1713554984000000000
[ ... ]

I dont want to derail this issue too much, but yes your correct, but this doesnt actually solve the issue of getting query logs into influx.

Initially I actually did the same thing as you mentioned just to see what it looked like and if it would work(which it does), but I dont have enough grafana knowledge to adjust the blocky dashboard to get it working with telegraf instead of Prometheus

kwitsch commented 2 months ago

Maybe we should externalize the query log into a separate service via gRPC? We can define streaming contract and provide implementation for postgres/mariadb/csv. People can create own implementation, since grpc is platform agnostic, it could be done in different language (for example java)?

I actually like the idea since the querylog is one of the main external interfaces it would enable alternative storage solutions like InfluxDB without pollution of the blocky code itself.

ThinkChaos commented 2 months ago

Re InfluxDB having one client per version: I think we can just pick the latest that Grafana supports since anyways the point is to use them together. Based on this InfluxDB blog post that seems to be 3.0 since around Jan 22, 2024.

Also I'm not against adding some kind of RPC/websocket for query logging, though I think implementing a new backend out of blocky might be more work than inside since you're starting from scratch.

kwitsch commented 2 months ago

Generating a client from proto files(grpc) is fairly easy and can be done in most modern languages with the help of auto generated wrappers. 😉

I would have been interested in InfluxDB if the support of an implementation would have covered older backbends as well. By only supporting the latest version it's not worth it in my opinion.

A similar behavior(entry TTL and grafana support) could be achieved with redis where we already have a client(that only needs a minor extension to store query logs). 🤷‍♂️

starsoccer commented 2 months ago

Generating a client from proto files(grpc) is fairly easy and can be done in most modern languages with the help of auto generated wrappers. 😉

I would have been interested in InfluxDB if the support of an implementation would have covered older backbends as well. By only supporting the latest version it's not worth it in my opinion.

A similar behavior(entry TTL and grafana support) could be achieved with redis where we already have a client(that only needs a minor extension to store query logs). 🤷‍♂️

+1 I don't think supporting just the latest influxdb would be ideal. To reiterate my last comment I think just writing the query log to a file in near real time would be sufficient and is the most flexible as anyone can parse a file. This could even be a CSV still just faster then once a day.

The issue currently is really just that there is no way to get near realtime query log without running a database

ThinkChaos commented 2 months ago

Alright then I think we should:

  1. Rename/refocus this issue to be about adding a way to stream the query log from another process, and maybe over the network.
    I'll let you do that if you agree.
  2. Defer anything InfluxDB related to when/if we get other requests about a specific version

About the query logging:

I'm not sold on gRPC + protobuf: I think HTTP Server Sent Events (SSE) + JSON would be easier to use for clients since they're even more ubiquitous, don't require a protocol build step, and most important IMO, we already use them and could make this "just" another service after #1427. With HTTP that service could be merged/multiplexed on a single listener, which I don't think is possible with gRPC based on grpc.Server docs (though I didn't look hard TBH). So we could expose it as a new HTTP route that can be hosted on the same HTTP server as the other HTTP endpoints and benefit from the same certs for HTTPS, or same reverse proxy, etc. And support having it split once that's fully implemented. Proof of the low entry bar, you could even use shell for basic analysis/log tailing:
curl --no-buffer https://blocky/whatever | jq (or nc -U /run/blocky/querylog.sock | jq if we add Unix socket support)

The JSON format could be something very similar to what we use for structured logging in querylog.LoggerWriter, or maybe even the same if we change client_names there to be a native array instead of turning it into a string.

starsoccer commented 2 months ago

I've updated the original post to now be having a realtime query log without database. Apologies for originally tying the request too tightly with influx.

While having a way to do this over the network would be useful I still think the lowest hanging option here is really just a near real time CSV file.

Another alternative could be just writing to a remote syslog if we want something that can do it over the network

kwitsch commented 2 months ago

Another alternative could be just writing to a remote syslog if we want something that can do it over the network

Wouldn't that be already possible if you enable querylog to console and pipe the binary output to a remote syslog target? 🤔

ThinkChaos commented 2 months ago

No worries, I think it's better for everyone to bring up multiple possibilities :)

Yeah I think Unix socket would be nicer to avoid going through actual storage, especially if you're not trying to save the data.

I think the existing CSV option is actually already near real time: the "one per day" in the docs means the file name is suffixed with the date.
I didn't catch that earlier but I think it's already mostly what you want, though the "one per day" might require logic on the "client" side to detect when the file switches... That might at least be good enough to give it a try now.

kwitsch commented 2 months ago

I'm not sold on gRPC + protobuf: I think HTTP Server Sent Events (SSE) + JSON would be easier to use for clients since they're even more ubiquitous, don't require a protocol build step, and most important IMO, we already use them and could make this "just" another service after #1427.

I'm not familiar with SSE to be honest. 🫣 It's ok for me if it's easier to use and implement. 👍

starsoccer commented 2 months ago

Another alternative could be just writing to a remote syslog if we want something that can do it over the network

Wouldn't that be already possible if you enable querylog to console and pipe the binary output to a remote syslog target? 🤔

Hmm not sure. How would I do that exactly in a docker container?

starsoccer commented 2 months ago

No worries, I think it's better for everyone to bring up multiple possibilities :)

Yeah I think Unix socket would be nicer to avoid going through actual storage, especially if you're not trying to save the data.

I think the existing CSV option is actually already near real time: the "one per day" in the docs means the file name is suffixed with the date.
I didn't catch that earlier but I think it's already mostly what you want, though the "one per day" might require logic on the "client" side to detect when the file switches... That might at least be good enough to give it a try now.

I have no issue with a Unix socket but I assume that whatever reads that socket would need to be in the same docker container?

Oh yeah the CSV file confused me. Sounds like it could work then. The question is just how to handle the file name changing. I'll look into that with telegraf otherwise maybe we just need an option that simple appends to the same file rather then rotating

Edit: It does seem that telegraf has a way to read files with a glob. So it may be possible to just say read any files in a folder. But I'll have to test and see what happens

ThinkChaos commented 2 months ago

I'm not familiar with SSE to be honest. 🫣

It's basically just a HTTP request that the server never closes and writes data by chunks. It's an alternative to WebSockets for when you only need events going one way. The nice thing is it's compatible with lots of clients out of the box because the protocol is basic HTTP. And implementing it on the server side is also pretty easy: write the event data with ResponseWrite.Write, call Flush, and repeat.

Hmm not sure. How would I do that exactly in a docker container?

Docker has options for how to collect logs, and I'm sure there's a way to send the container to something else.

I have no issue with a Unix socket but I assume that whatever reads that socket would need to be in the same docker container?

Since it's a file, you can put it in a volume that both the host and container have access to! You can even mount that same host dir into another container. That can be used for example to expose a service via a reverse proxy that connect to the service via Unix socket. I like that kind of pattern because file system permissions are easier than firewalls.

0xERR0R commented 2 months ago

Imho, sse is good for webbrowser - server communication ang gRPC is more universal. Also from the performance point of view, gRPC should be better (http2,multiplexing,binary message format).

ThinkChaos commented 2 months ago

I don't know if the performance really matters for a querylog stream, but SSE works fine with HTTP 2 and even 3. You can also multiplex other requests over the same connection if your client supports it.
I think the only potential performance difference between would be binary messages, but even then JSON compresses pretty well. Anyways the main reason I think SSE would be nice for such an API of SSE is the simplicity for both client and server.

In the sense of what you were proposing to have gRPC API for blocky to basically have plugins, I'm not necessarily against that ant think it makes more sense since you'll likely need calls both ways.

kwitsch commented 2 months ago

In theory we could even let the user configure it by switching between Json & messagepack (for example). Both can be serialized from structs pretty easily and this would enable optional binary compression by sacrificing readability.🤔

kwitsch commented 2 months ago

To summarize my point of view: I like gRPC because of its clear structure and two way communication but find the idea tempting to let those logs stream through curl(especially for debugging purposes). 😅

ruben-sprengel commented 2 months ago

Sending the logs & querry logs to a remote syslog target would be awsome, so i could ship them e.g. to my Grafana Loki instance and could have the logs aswell in a Grafana Dashboard. Another option could be to have a argument in the config file to enable to store the logs into a .log file and then expose this path to the local host. Then Promtail could maybe scrape this path and ship logs to Loki.

I am running blocky on a ARM host, so currently the Grafana Loki docker driver plugin is still not supported (only linux/amd64 ) to ship logs directly from a container to Grafana Loki :(

starsoccer commented 1 month ago

So I tried playing a bit with telegraf and the csv in order to get telegraf to send the csv data into influx. And while I can get to push the data in Im struggling with actually being able to query on the data and get useful metrics out of it. Hopefully somewhere here who is a bit more familiar with telegraf can point in the right direction and others can also make use of it.

Currently this is what I have:

[[inputs.file]]
  files = ["/blocky-logs/*.log"]
  data_format = "csv"
  # Not sure if the order of the columns is correct or not
  csv_column_names = ["time", "ip", "clientName", "duration", "responseReason", "question", "responseAnswer"]
  # My influx knowledge isnt great either so Im not clear on what Id want as a tag vs what Id want as a field
  csv_tag_columns = ["ip", clientName", "question"]
  csv_timestamp_column = "time"
  csv_timestamp_format = "2006-01-02 15:04:05"  
  csv_delimiter = "\t"

[[outputs.influxdb]]
    urls = ["INFLUX URL HERE"]
    database = "DATABASE HERE"
    username = "USERNAME HERE"
    password = "PASSWORD HERE"