AdguardTeam / AdGuardHome

Network-wide ads & trackers blocking DNS server
https://adguard.com/adguard-home/overview.html
GNU General Public License v3.0
25.75k stars 1.85k forks source link

Remove encoding/compression for Answer value in querylog.json #7199

Open tunloop opened 3 months ago

tunloop commented 3 months ago

Prerequisites

The problem

Currently, the querylog.json contains partially compressed and base64 encoded data that represents the DNS query response (domain name, returned IP address, or other data depending on record type). Attempting to base64 decode the string does not leave a readable string:

ó•••␀␁␀ ␀␀␀␀␁i␅ytimg␃com␀␀␁␀␁À
␀␁␀␁␀␀␁,␀␄•û!vÀ
␀␁␀␁␀␀␁,␀␄¬Ù␎ÖÀ
␀␁␀␁␀␀␁,␀␄•úÙVÀ
␀␁␀␁␀␀␁,␀␄•úÙvÀ
␀␁␀␁␀␀␁,␀␄•ûÓöÀ
␀␁␀␁␀␀␁,␀␄•û!VÀ
␀␁␀␁␀␀␁,␀␄•úEÖÀ
␀␁␀␁␀␀␁,␀␄•û×öÀ
␀␁␀␁␀␀␁,␀␄¬Ù␎ö

Using this log for any SEIM setup reduces visibility into returned DNS query addresses, making it much more difficult to triage other network logs and information (such as correlating firewall logs with the DNS answer logs, or triaging other layer 3 log entries with DNS activity of a certain host). Also, it is currently not possible to search in the Adguard web log for a Query response IP address.

Currently, in order to triage this information, I have to leave the unified single-pane SEIM web interface and go into Adguard's web interface and correlate between the two pages.

As posted over in https://github.com/AdguardTeam/AdGuardHome/discussions/4246 , the script required to parse this field is not simply a one-liner. This processing requirement makes it difficult or near impossible to ingest into any automated system without many external scripts/plugins/interactions. And although a partial solution was presented in 4246, using the Adguard API to obtain DNS query response information, is not a practical solution when Adguard is generating upwards of 100,000 query logs per day and a SEIM system would have to make HTTP connections for all those queries. A more straight forward solution is to eliminate the need to translate data formats in the first place.

Proposed solution

Remove obfuscation of the Answer data in querylog.json

For example: current query log format:

{"T":"2024-08-17T12:13:00.288878908-07:00","QH":"i.ytimg.com","QT":"A","QC":"IN","CP":"","Upstream":"tls://dns11.quad9.net:853","Answer":"85uBgAABAAkAAAAAAWkFeXRpbWcDY29tAAABAAHADAABAAEAAAEsAASO+yF2wAwAAQABAAABLAAErNkO1sAMAAEAAQAAASwABI762VbADAABAAEAAAEsAASO+tl2wAwAAQABAAABLAAEjvvT9sAMAAEAAQAAASwABI77IVbADAABAAEAAAEsAASO+kXWwAwAAQABAAABLAAEjvvX9sAMAAEAAQAAASwABKzZDvY=","IP":"192.168.10.3","Result":{},"Elapsed":322644,"Cached":true}

New query log format:

{"T":"2024-08-17T12:13:00.288878908-07:00","QH":"i.ytimg.com","QT":"A","QC":"IN","CP":"","Upstream":"tls://dns11.quad9.net:853","Answer":[{"Name":"i.ytimg.com","Address":"142.251.33.118"},{"Name":"i.ytimg.com","Address":"172.217.14.214"},{"Name":"i.ytimg.com","Address":"142.250.217.86"},{"Name":"i.ytimg.com","Address":"142.250.217.118"},{"Name":"i.ytimg.com","Address":"142.251.211.246"},{"Name":"i.ytimg.com","Address":"142.251.33.86"},{"Name":"i.ytimg.com","Address":"142.250.69.214"},{"Name":"i.ytimg.com","Address":"142.251.215.246"},{"Name":"i.ytimg.com","Address":"172.217.14.246"}],"IP":"192.168.10.3","Result":{},"Elapsed":322644,"Cached":true}

This would make using third party log analysis/collections tools much more simple and straight forward.

Alternatives considered and additional information

In reference to this issue: https://github.com/AdguardTeam/AdGuardHome/issues/6974 I realize that Adguard's purpose is not to integrate with third party log collection tools, but I am not saying to bend over backwards to be compatible with these systems, I am only asking that the query log, that is already mostly plain text JSON, be written entirely as plain text JSON, if not for any other reason than to be consistent within the log file.

Side note: I am not sure what the original intention behind encoding this one section of the query log, it seems strange to have DNS queries be recorded in plain text, but DNS responses are recorded compressed and encoded. It would also appear that this process of encoding and compressing, however small, incurrs a performance penalty?

saint-lascivious commented 3 months ago

At the risk of butting in here, the answer is not obfuscated. It's just the raw data of the reply, straight from the tap.

Edit: It just became apparent to me that the issue you referenced also supplies this same answer.

tunloop commented 3 months ago

At the risk of butting in here, the answer is not obfuscated. It's just the raw data of the reply, straight from the tap.

Edit: It just became apparent to me that the issue you referenced also supplies this same answer.

Thats fair. I meant more in the sense that for a log file, its obfuscated. And no, I don't count cat'ing .journal log files in this case : )

romanlex commented 1 month ago

what about this feature request?