Open PenelopeFudd opened 3 months ago
I've taken a stab at adding authentication, but now I'm getting 400 (bad request) and 413 (payload too large) errors.
curl -sS -u username:password 'https://elasticsearch.infra/_cluster/settings?include_defaults=true&filter_path=defaults.http.max_content_length'
{"defaults":{"http":{"max_content_length":"100mb"}}}
The config.yml
file says bulk-size: 1048576 # 1MB
, so size shouldn't be an issue.
I've added debugging statements to view what's being sent to elasticsearch:
{ "create" : {}}
{"dns.flags.aa":false,"dns.flags.ad":false,"dns.flags.cd":false,"dns.flags.qr":false,"dns.flags.ra":false,"d.....
From this StackOverflow post, it looks like it should be starting with { "index":{} }
, but I don't know what's normal yet.
Any suggestions?
Could you share your elastic and dnscollector config files please ?
Here's my dnscollector config.yml file, after I modified the source to support Basic authentication:
``` ################################################ # global configuration # more details: https://github.com/dmachard/go-dnscollector/blob/main/docs/configuration.md#global ################################################ global: trace: verbose: true server-identity: "dns-collector" pid-file: "" # text-format: "timestamp-rfc3339ns identity operation rcode queryip queryport family protocol length-unit qname qtype latency" text-format: "timestamp-rfc3339ns identity operation rcode queryip queryport family protocol length-unit qname qtype edns-csubnet latency answer" text-format-delimiter: " " text-format-boundary: "\"" text-jinja: "" worker: interval-monitor: 10 buffer-size: 4096 telemetry: enabled: true web-path: "/metrics" web-listen: ":9165" prometheus-prefix: "dnscollector_exporter" tls-support: false tls-cert-file: "" tls-key-file: "" client-ca-file: "" basic-auth-enable: false basic-auth-login: admin basic-auth-pwd: omitted ################################################ # Pipelining configuration # more details: https://github.com/dmachard/go-dnscollector/blob/main/docs/running_mode.md#pipelining # workers: https://github.com/dmachard/go-dnscollector/blob/main/docs/workers.md # transformers: https://github.com/dmachard/go-dnscollector/blob/main/docs/transformers.md ################################################ pipelines: - name: powerdns powerdns: listen-ip: 127.0.0.1 listen-port: 6001 tls-support: false tls-min-version: 1.2 cert-file: "" key-file: "" reset-conn: true chan-buffer-size: 0 add-dns-payload: true routing-policy: forward: [ console ] dropped: [ ] - name: tap dnstap: listen-ip: 127.0.0.1 listen-port: 6000 transforms: normalize: qname-lowercase: true qname-replace-nonprintable: true routing-policy: forward: [ elastic ] dropped: [ ] - name: console stdout: mode: json - name: elastic elasticsearch: server: "https://elasticsearch.example.org/" index: "" chan-buffer-size: 0 bulk-size: 1048576 # 1MB flush-interval: 10 # in seconds compression: none bulk-channel-size: 10 basic-auth-enable: true basic-auth-login: elastic basic-auth-pwd: omitted ```
Unfortunately I don't have access to the Elasticsearch config file, it's run by another team.
I did add debugging statements that printed the body of an elasticsearch request, which looks somewhat like this:
```javascript { "create" : {}} {"dns.flags.aa":false,"dns.flags.ad":false,"dns.flags.cd":false,"dns.flags.qr":false,"dns.flags.ra":false,"dns.flags.rd":true,"dns.flags.tc":false,"dns.id":0,"dns.length":128,"dns.malformed-packet":false,"dns.opcode":0,"dns.qclass":"IN","dns.qname":"v1.pv-txt.pool.dns.example.com","dns.qtype":"TXT","dns.questions-count":1,"dns.rcode":"NOERROR","dns.resource-records.an":"-","dns.resource-records.ar":"-","dns.resource-records.ns":"-","dnstap.extra":"-","dnstap.identity":"dnsdist_server","dnstap.latency":0,"dnstap.operation":"CLIENT_QUERY","dnstap.peer-name":"localhost","dnstap.policy-action":"NXDOMAIN","dnstap.policy-match":"QNAME","dnstap.policy-rule":"-","dnstap.policy-type":"-","dnstap.policy-value":"-","dnstap.query-zone":"-","dnstap.timestamp-rfc3339ns":"2024-07-25T00:57:51.2575189Z","dnstap.version":"dnsdist 1.9.6","edns.dnssec-ok":0,"edns.options.0.code":8,"edns.options.0.data":"159.250.13.0/24","edns.options.0.name":"CSUBNET","edns.options.1.code":12,"edns.options.1.data":"-","edns.options.1.name":"PADDING","edns.rcode":0,"edns.udp-size":4096,"edns.version":0,"network.family":"IPv4","network.ip-defragmented":false,"network.protocol":"DOH","network.query-ip":"10.167.0.248","network.query-port":"42927","network.response-ip":"10.0.22.133","network.response-port":"443","network.tcp-reassembled":false} ```
Wondering if the "create"
should really be "index"
, and whether the index:""
should have a value.
Had a bit of a breakthrough!
Added this module to the code, got a real curl command and the output of that was way more informative!
This input record failed:
{"dns.flags.aa":true,"dns.flags.ad":false,"dns.flags.cd":false,"dns.flags.qr":true,"dns.flags.ra":false,"dns.flags.rd":true,"dns.flags.tc":false,"dns.id":0,"dns.length":394,"dns.malformed-packet":false,"dns.opcode":0,"dns.qclass":"IN","dns.qname":"v1.pv-txt.pool.dns.example.com","dns.qtype":"TXT","dns.questions-count":1,"dns.rcode":"NOERROR","dns.resource-records.an.0.class":"IN","dns.resource-records.an.0.name":"v1.pv-txt.pool.dns.example.com","dns.resource-records.an.0.rdata":"{\"version\": \"v1.0\", \"selection\": [{\"popId\": \"xyzzy\"}, ","dns.resource-records.an.0.rdatatype":"TXT","dns.resource-records.an.0.ttl":12,"dns.resource-records.ar":"-","dns.resource-records.ns":"-","dnstap.extra":"cached","dnstap.identity":"dnsdist_server","dnstap.latency":0,"dnstap.operation":"CLIENT_RESPONSE","dnstap.peer-name":"localhost","dnstap.policy-action":"NXDOMAIN","dnstap.policy-match":"QNAME","dnstap.policy-rule":"-","dnstap.policy-type":"-","dnstap.policy-value":"-","dnstap.query-zone":"-","dnstap.timestamp-rfc3339ns":"2024-07-25T20:50:31.485687332Z","dnstap.version":"dnsdist 1.9.6","edns.dnssec-ok":0,"edns.options.0.code":8,"edns.options.0.data":"19.50.13.0/24","edns.options.0.name":"CSUBNET","edns.rcode":0,"edns.udp-size":1232,"edns.version":0,"network.family":"IPv4","network.ip-defragmented":false,"network.protocol":"DOH","network.query-ip":"10.67.0.248","network.query-port":"48421","network.response-ip":"10.20.22.133","network.response-port":"443","network.tcp-reassembled":false}
Ran a test-case minimization program and came up with this:
``` $ curl -sS --fail-with-body \ -X POST https://elasticsearch.infra/dnscollector/_bulk \ -H 'Authorization: Basic xxxxxx:yyyyyy' \ -H 'Content-Type: application/x-ndjson' \ -d '{ "create" : {}}'$'\n''{"dns.resource-records.an.0.class":"IN"}'$'\n' \ | jq . { "errors": true, "took": 6, "items": [ { "create": { "_index": "dnscollector", "_id": "8qR87JABur2Qow5SzDbK", "status": 400, "error": { "type": "document_parsing_exception", "reason": "[1:36] failed to parse field [dns.resource-records.an] of type [text] in document with id '8qR87JABur2Qow5SzDbK'. Preview of field's value: '{0={class=IN}}'", "caused_by": { "type": "illegal_state_exception", "reason": "Can't get text on a START_OBJECT at 1:2" } } } } ] }
Any idea what's wrong with it? Do we need to tweak the Elasticsearch configuration?
Thanks
It turns out that if there are dots in the field names, ElasticSearch (ES) 8.13.4 interprets them as subobjects.
If you try to give the same name to an object and a string ES complains.
The ES naming conventions say this (among other things):
If a field name matches the namespace used for nested fields, add .value to the field name. For example, instead of:
workers workers.busy workers.idle
Use:
workers.value workers.busy workers.idle
Another breakthrough:
It appears that the other team is using Nginx in front of ES, and the default size limit is in place, which is 1MB minus some bytes for overhead. By setting bulk-size: 1000000 # 1MB
in config.yml
, I stopped getting those 413 Payload too large
errors.
Now the only error I'm getting is
ERROR: 2024/07/25 22:23:06.092128 worker - [elastic] elasticsearch - Send buffer is full, bulk dropped
ERROR: 2024/07/25 22:23:06.110239 worker - [elastic] elasticsearch - Send buffer is full, bulk dropped
ERROR: 2024/07/25 22:23:06.128498 worker - [elastic] elasticsearch - Send buffer is full, bulk dropped
But that's probably because I'm using my standard load testing script to exercise this on a puny 2-cpu virtual machine. Chaos engineering in practice. 😄
Regarding authentication, could you submit a pull request to add support? It could be useful for others.
Is your feature request related to a problem? Please describe. The elasticsearch logger doesn't let me specify a username+password. Our pipeline is all ready to send its data into Elasticsearch but it can't authenticate. ☹️
Describe the solution you'd like Could you modify:
Describe alternatives you've considered
server
tohttps://user:password@host.name.here/
, but it didn't work.Additional context