elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.17k stars 4.91k forks source link

Duplicate packets when using af_packet #621

Closed devfacet closed 4 years ago

devfacet commented 8 years ago

Hi All,

I have configuration like;

interfaces:
  device: any
  type: af_packet
  buffer_size_mb: 124
protocols:
  dns:
    ports: [53]
    include_authorities: true
    include_additionals: true
output:
  file:
    filename: output.log
    path: /var/log/packetbeat
    rotate_every_kb: 100000
    number_of_files: 2
  elasticsearch:
    hosts: ["localhot:9200"]

Once I ran nslookup google.com

;; Got SERVFAIL reply from 127.0.0.1, trying next server
Server:     172.16.0.23
Address:    172.16.0.23#53

Non-authoritative answer:
Name:   google.com
Address: 74.125.22.113
Name:   google.com
Address: 74.125.22.138
Name:   google.com
Address: 74.125.22.139
Name:   google.com
Address: 74.125.22.100
Name:   google.com
Address: 74.125.22.101
Name:   google.com
Address: 74.125.22.102

packetbeat goes infinity loop;

2016/01/05 02:38:13.432665 preprocess.go:94: DBG  Forward preprocessed events
2016/01/05 02:38:13.432673 async.go:44: DBG  async forward to outputers (2)
2016/01/05 02:38:13.432693 output.go:50: DBG  output worker: publish single event
2016/01/05 02:38:13.904256 output.go:103: DBG  output worker: publish 2 events
2016/01/05 02:38:14.433497 client.go:55: DBG  send event
2016/01/05 02:38:14.433536 async.go:71: DBG  send async event
2016/01/05 02:38:14.433755 preprocess.go:37: DBG  Start Preprocessing
2016/01/05 02:38:14.433891 publish.go:100: DBG  Publish: {
  "@timestamp": "2016-01-05T02:38:00.915Z",
  "beat": {
    "hostname": "...",
    "name": "..."
  },
  "bytes_out": 28,
  "client_ip": "127.0.0.1",
  "client_port": 59705,
  "client_proc": "",
  "client_server": "...",
  "count": 1,
  "direction": "out",
  "dns": {
    "additionals_count": 0,
    "answers_count": 0,
    "authorities_count": 0,
    "flags": {
      "authoritative": false,
      "recursion_allowed": false,
      "recursion_desired": true,
      "truncated_response": false
    },
    "id": 31763,
    "op_code": "QUERY",
    "question": {
      "class": "IN",
      "name": "google.com",
      "type": "A"
    },
    "response_code": "SERVFAIL"
  },
  "ip": "127.0.0.1",
  "method": "QUERY",
  "notes": "Response was received without an associated query.",
  "port": 53,
  "proc": "",
  "query": "class IN, type A, google.com",
  "resource": "google.com",
  "server": "...",
  "status": "Error",
  "transport": "udp",
  "type": "dns"
}
2016/01/05 02:38:14.433924 preprocess.go:94: DBG  Forward preprocessed events
2016/01/05 02:38:14.433932 async.go:44: DBG  async forward to outputers (2)
2016/01/05 02:38:14.433952 output.go:50: DBG  output worker: publish single event
2016/01/05 02:38:14.904229 output.go:103: DBG  output worker: publish 1 events
2016/01/05 02:38:15.434827 client.go:55: DBG  send event
2016/01/05 02:38:15.434852 async.go:71: DBG  send async event
2016/01/05 02:38:15.435105 preprocess.go:37: DBG  Start Preprocessing
2016/01/05 02:38:15.435247 publish.go:100: DBG  Publish: {
  "@timestamp": "2016-01-05T02:38:00.915Z",
  ...

Seems like it doesn't cleanup error events...

screen shot 2016-01-04 at 9 35 44 pm

I'm running it on CentOS 6 / Kernel 3.10

andrewkroh commented 8 years ago

@cmfatih For reference, what version of Packetbeat are you running? And I thought CentOS 6 used a 2.6 kernel?

I was able to reproduce the issue from master on Centos 7 (3.10.0-327.3.1.el7.x86_64). The problem does not occur if I use pcap instead of af_packet. The sniffer is getting new data when it calls ReadPacketData() so I don't think this issue is being caused by the DNS plugin. I don't see any obvious mistakes when looking at the sniffer code. It would be good to get another set of eyes on that code.

devfacet commented 8 years ago

@andrewkroh : I use 1.0.1

./packetbeat -version
packetbeat version 1.0.1 (amd64)

Do you have any numbers about performance of pcap ? I used af_packet just because performance concerns. But I may use pcap until the issue is solved IF it's not so slow.

UPDATE:

CentOS 6 with updated Kernel

uname -a
Linux ... 3.10.93-1.el6.elrepo.x86_64
tsg commented 8 years ago

@andrewkroh can you try 1.0.1 as well to check that we're talking about the same issue?

andrewkroh commented 8 years ago

@tsg I tried the packetbeat-1.0.1 RPM on Centos 7 and I see the same issue. It looks like the same UDP packets are continuously being received from af_packet.

andrewkroh commented 8 years ago

My kernel info for testing is 3.10.0-327.3.1.el7.x86_64.

andrewkroh commented 8 years ago

If I configure af_packet to use TPACKET_V2 instead of TPACKET_V3 by adding the configuration option afpacket.OptTPacketVersion(afpacket.TPacketVersion2) here, then the data appears to be received correctly. But shutting down didn't work correctly after this change because it was stuck in the pollForFirstPacket loop; it stopped after a packet was received to break the loop.

Looking more closely at the system calls I see that poll is basically always returning with revents == 1 after the first packet is received. After changing to TPACKET_V2 I see revents == 0 unless a new packet is received and then it's 1.

andrewkroh commented 8 years ago

I just found this issue google/gopacket#80 (but we aren't doing any packet writing).

faec commented 4 years ago

It looks like this problem was last seen in version 1, so I'm closing this. We can reopen it if there are more recent reports.