Closed romainw closed 4 months ago
Thank you for reporting this issue.
Could you please provide more details, such as:
Many thanks for the quick reply.
################################################
# Collector configuration
################################################
global:
#pid-file: "/var/dnscollector/collector.pid"
trace:
verbose: true
server-identity: "pDNSSOC_C"
text-format: "timestamp-rfc3339ns qr identity operation rcode queryip queryport protocol qname qtype name"
# default text field delimiter
text-format-delimiter: " "
# default text field boundary
text-format-boundary: "\""
pipelines:
- name: dnstap-server1
file-ingestor:
watch-dir: /var/pdnssoc_input/server1
watch-mode: dnstap
delete-after: true
transforms:
filtering:
log-queries: true
log-replies: true
drop-queryip-file: /root/dns_servers.txt
routing-policy:
forward: [ filelogdomains, filelogips, fileall ]
- name: filelogdomains
logfile:
file-path: /var/dnscollector/matches/matches_domains.json
mode: json
transforms:
filtering:
keep-fqdn-file: '/var/dnscollector/misp_domains.txt'
- name: filelogips
logfile:
file-path: /var/dnscollector/matches/matches_ips.json
mode: json
transforms:
filtering:
keep-rdata-file: '/var/dnscollector/misp_ips.txt'
- name: fileall
logfile:
file-path: /var/dnscollector/queries/queries.json
mode: json
flush-interval: 1
max-size: 100
max-files: 10
chan-buffer-size: 65535
#postrotate-command: "/var/dnscollector/postrotate_query.sh"
postrotate-delete-success: true
(this is the configuration file upgraded for 0.46b)
The issue appeared with 0.45.0. In other words, 0.44.0 is the last working version for us.
Thanks again!
Thank for sharing config, A refactoring of the transformers was done in v0.45.0, I suspect something wrong on it. Could you test without the filtering transform ?
Can you share your frstm file ? it will be more easy to reproduce in my side.
I found the regression, the dnstap packet process is never started... with the ingestor. I will add a test for this use case to avoid this regression
wow, that was quick! Good work @dmachard!
I cannot easily share the dnstap file, which is coming from our production systems. However, I can/will try any test or beta release to confirm the issue is fixed.
Thanks again!
fix pushed in master branch and release v0.46.0-beta2
This is just to confirm that the issue has been fixed successfully starting 0.46.0-beta2. Thank you!
When upgrading from 0.41 to 0.45/0.46b, go-dnscollector is now hanging when processing files. The flow works well for tiny fstrm files, but as soon as the file reaches a certain size apparently, this creates a condition between the different threads causing an endless wait loop.
If we take the first 50 000 lines of the file below (dnstap-2024-06-03-19:56:54.fstrm), the file is successfully processed and then deleted. If we take the first 100 000 lines of the very same file (for a total of 30MB), go-dnscollector will hang whilst processing the file, and never complete this work or subsequently delete it:
Running
strace
at this point will show this loop (note the "Connection timed out"):Trying to exit the go-dnscollector at this point with CTRL+C is also hanging, requiring a full
kill -9
of the process:Just replacing the go-dnscollector 0.45/0.46b binary with version 0.41 "solves" this issue and the fstrm file is processed instantly and then successfully deleted.