CESNET / ipfixcol2

High-performance NetFlow v5/v9 and IPFIX collector (RFC7011)
Other
122 stars 36 forks source link

Failed to start by command 'ipfixcol2 -c /root/ipfixudp.xml' #56

Open aobataliang opened 2 years ago

aobataliang commented 2 years ago

Hi Expert, it always show :The instance holds information about 0 active session(s).

root@logger-vm-3:~# ipfixcol2 -c /root/ipfixudp.xml -vvv INFO: Configurator: Information Elements have been successfully loaded from '/etc/libfds/'. INFO: Configurator (plugin manager): 10 plugins found DEBUG: Configurator (plugin manager): Plugin 'json' has been successfully loaded from '/usr/local/lib/ipfixcol2/libjson-output.so'. DEBUG: Configurator (plugin manager): Input plugin 'udp' does not support requests to close a Transport Session. DEBUG: Configurator (plugin manager): Plugin 'udp' has been successfully loaded from '/usr/local/lib/ipfixcol2/libudp-input.so'. DEBUG: Configurator: All plugins have been successfully loaded. DEBUG: JSON output: Calling instance constructor of the plugin 'json' DEBUG: Output manager: Calling instance constructor of the plugin 'Output manager' DEBUG: UDP collector (parser): Calling instance constructor of the plugin 'IPFIX Parser' DEBUG: UDP collector: Calling instance constructor of the plugin 'udp' DEBUG: JSON output: (File output) Thread started... INFO: UDP collector: The socket receive buffer size of a new socket (local IP 11.100.5.222) enlarged (from 106496 to 16777216 bytes). INFO: UDP collector: Bind succeed on 11.100.5.222 (port 4739) DEBUG: Configurator: All instances have been successfully initialized. DEBUG: JSON output: Instance thread of the output plugin 'json' has started! DEBUG: Configurator: All threads of instances has been successfully started. DEBUG: UDP collector (parser): Instance thread of the intermediate plugin 'IPFIX Parser' has started! DEBUG: Output manager: Instance thread of the intermediate plugin 'Output manager' has started! DEBUG: UDP collector: Instance thread of the input plugin 'udp' has started! DEBUG: UDP collector: The instance holds information about 0 active session(s). DEBUG: UDP collector: The instance holds information about 0 active session(s). DEBUG: UDP collector: The instance holds information about 0 active session(s). DEBUG: UDP collector: The instance holds information about 0 active session(s). DEBUG: UDP collector: The instance holds information about 0 active session(s). DEBUG: UDP collector: The instance holds information about 0 active session(s). DEBUG: UDP collector: The instance holds information about 0 active session(s). DEBUG: UDP collector: The instance holds information about 0 active session(s).

Lukas955 commented 2 years ago

Hi,

this debug message just reports that there is no flow exporter connected over UDP. Do you have an issue with flows record reception?

Lukas

aobataliang commented 2 years ago

Hi,

this debug message just reports that there is no flow exporter connected over UDP. Do you have an issue with flows record reception?

Lukas

yes, xml configure as follow: we could get the udp ipfix traffic by capture pcap. but according to the debug message, it failed establish the connection. not sure whether related with vlan? any suggestion?

UDP collector udp 4739 11.100.5.222 vlan444@ens9: mtu 2140 qdisc noqueue state UP group default qlen 1000 link/ether xxxxxx brd ff:ff:ff:ff:ff:ff inet 11.100.5.222/24 scope global vlan444 valid_lft forever preferred_lft forever inet6 fxxx scope link valid_lft forever preferred_lft forever
Lukas955 commented 2 years ago

Based on my experience, I guess that there is a firewall on your system that blocks UDP port 4739. Make sure that the port is open.

Lukas

aobataliang commented 2 years ago

Based on my experience, I guess that there is a firewall on your system that blocks UDP port 4739. Make sure that the port is open.

Lukas

Hi Lukas, thanks for your replay. we could get the udp ipfix traffic by capture pcap, it means the ipfix initiator send the log to the collector.

root@logger-vm-3:/etc/systemd/system# netstat -na | grep 4739 udp 0 0 11.100.5.222:4739 0.0.0.0:*

On my experience, the connection will established once the log received. after check my iptables, there is no rules in it.

root@logger-vm-3:/etc/systemd/system# iptables -nL Chain INPUT (policy ACCEPT) target prot opt source destination

Chain FORWARD (policy ACCEPT) target prot opt source destination

Chain OUTPUT (policy ACCEPT) target prot opt source destination

Lukas955 commented 2 years ago

Hi,

are you using RHEL/Rocky/CentOS 8? If so, it is possible that iptables have been replaced with firewalld. Try to use firewall-cmd --list-all to list the current firewall configuration. Then you can use firewall-cmd --permanent --zone=public --add-port=4739/udp to allow all incoming IPFIX traffic.

Lukas

Lukas955 commented 2 years ago

Hello,

did you manage to receive any traffic?

Lukas

cosmostaylor commented 3 months ago

Hi, we ran into the same issue. We have the firewalld process disabled. We also have ipfixcol running on the same server. ipfixcol listening l4 port 4444 is working properly reading the data received from the interface. We set up ipfixcol2 to listen port 4445, we see the packets in the interface but see the error "The instance holds information about 0 active session(s)." continuously. When we changed port to 4444 for ipfixcol2, the same continues. Any ideas or any suggestions?

sedmicha commented 3 months ago

Hi, we ran into the same issue. We have the firewalld process disabled. We also have ipfixcol running on the same server. ipfixcol listening l4 port 4444 is working properly reading the data received from the interface. We set up ipfixcol2 to listen port 4445, we see the packets in the interface but see the error "The instance holds information about 0 active session(s)." continuously. When we changed port to 4444 for ipfixcol2, the same continues. Any ideas or any suggestions?

Have you tried confirming that the port is indeed accessible with something like netcat? For instance by running nc -u -l 4445 on the machine instead of ipfixcol2 and seeing if you can receive any data.

cosmostaylor commented 3 months ago

Yes, we did confirm by bringing up ipfixcol with port 4445. We saw that it is receiving and writing data. So, I can confirm that the port is accessible.

sedmicha commented 3 months ago

Yes, we did confirm by bringing up ipfixcol with port 4445. We saw that it is receiving and writing data. So, I can confirm that the port is accessible.

Sorry, I'm a bit confused. I thought your issue was that ipfixcol2 isn't receiving data, and now you're saying it is?

sedmicha commented 3 months ago

Is your issue just seeing the message "The instance holds information about 0 active session(s)." itself? That does not necessarily indicate any error, it's just an information that there are currently no active sessions for that particular input plugin, meaning no flow exporter is currently connected to it over that particular connection method.

cosmostaylor commented 3 months ago

Let me rephrase our problem.

We have both ipfixcol and ipfixcol2 running on the same machine. ipfixcol has startup.xml file listening to port 4444 and ipfixcol2 configured to listen 4445. The udp ipfix packets are sent to udp port 4444 and udp port 4445.

ipfixcol (https://github.com/CESNET/ipfixcol) is reading the data (from udp port 4444) and decoding ipfix packets and generating output. On the other hand, ipfixcol2 (https://github.com/CESNET/ipfixcol2) is showing "The instance holds information about 0 active session(s)." message (for udp port 4445).

What we tried is, changing the configuration of ipfixcol to listen udp port 4445. Everything is working fine. ipfixcol reads the data from udp port 4445 and generates output.

We also tried changing the configuration of ipfixcol2 to listen udp port 4444. We were seeing "The instance holds information about 0 active session(s)." message again.

sedmicha commented 3 months ago

Let me rephrase our problem.

We have both ipfixcol and ipfixcol2 running on the same machine. ipfixcol has startup.xml file listening to port 4444 and ipfixcol2 configured to listen 4445. The udp ipfix packets are sent to udp port 4444 and udp port 4445.

ipfixcol (https://github.com/CESNET/ipfixcol) is reading the data (from udp port 4444) and decoding ipfix packets and generating output. On the other hand, ipfixcol2 (https://github.com/CESNET/ipfixcol2) is showing "The instance holds information about 0 active session(s)." message (for udp port 4445).

What we tried is, changing the configuration of ipfixcol to listen udp port 4445. Everything is working fine. ipfixcol reads the data from udp port 4445 and generates output.

We also tried changing the configuration of ipfixcol2 to listen udp port 4444. We were seeing "The instance holds information about 0 active session(s)." message again.

Ahh, thank you, now I understand. Could you provide the configuration files you're using for both ipfixcol and ipfixcol2? Also, what system are you using?

cosmostaylor commented 3 months ago

Thank you. Could you please specify which files you request? Only the startup.xml files?

sedmicha commented 3 months ago

Thank you. Could you please specify which files you request? Only the startup.xml files?

Yes, the default configuration file path is /etc/ipfixcol2/startup.xml and I think /etc/ipfixcol/startup.xml in case of ipfixcol. Unless you're specifying a custom config path with the -c option, in that case it might be named differently.

cosmostaylor commented 3 months ago

I attached the files. We are running on Red Hat Enterprise Linux Server release 7.9 (Maipo). startup_ipfixcol.txt startup_ipfixcol2.txt

Note: We did try localIPAddress empty and with the NIC IP as well.

Thanks.

cosmostaylor commented 3 months ago

Hi again,

Any ideas?

Thanks in advance.

sedmicha commented 3 months ago

Hi again,

Any ideas?

Thanks in advance.

Hello,

sorry for the delayed response. I got a bit caught up with work and forgot to reply, so thanks for reminding yourself.

I don't see any apparent issue in the provided config files. Can you also provide all the logs that you get from ipfixcol2?

Another thing I would try is using the tool ipfixsend2, that is included with ipfixcol2, to try and send data to the collector. You can try doing so both locally and from another machine to see if there's any difference. This repo also includes example ipfix data that you can use.

For example: ipfixsend2 -i ipfixcol2/doc/data/ipfix/example_flows.ipfix -d 192.168.1.100 -p 4445 -t udp

cosmostaylor commented 3 months ago

Hi Michal,

Thank you.

What we did is we tried ipfixsend2 with tcp (collector running tcp:4445) and udp(collector running udp:4445). The results are:

tcp input -> success (The instance holds information about 1 active session(s)) ipfix input -> success udp input -> failed (The instance holds information about 0 active session(s))

ipfixsend2 -d 0.0.0.0 -p 4445 -t UDP -i example_flows.ipfix -n 1 -R 1.0 ipfixsend2 -d 0.0.0.0 -p 4445 -t TCP -i example_flows.ipfix -n 1 -R 1.0

We checked the udp libraries but we couldn't find any issue.

sedmicha commented 3 months ago

Hi Michal,

Thank you.

What we did is we tried ipfixsend2 with tcp (collector running tcp:4445) and udp(collector running udp:4445). The results are:

tcp input -> success (The instance holds information about 1 active session(s)) ipfix input -> success udp input -> failed (The instance holds information about 0 active session(s))

ipfixsend2 -d 0.0.0.0 -p 4445 -t UDP -i example_flows.ipfix -n 1 -R 1.0 ipfixsend2 -d 0.0.0.0 -p 4445 -t TCP -i example_flows.ipfix -n 1 -R 1.0

We checked the udp libraries but we couldn't find any issue.

That means that you couldn't receive any data over UDP even when running ipfixsend2 locally? Huh, that is very weird. Does your system have multiple network interfaces? Could you try again with -d 127.0.0.1 and if that doesn't work either then perhaps a with a minimal config such as:

<ipfixcol2>
  <inputPlugins>
    <input>
      <name>UDP collector</name>
      <plugin>udp</plugin>
      <params>
        <localPort>4739</localPort>
        <localIPAddress></localIPAddress>
      </params>
    </input>
  </inputPlugins>
  <outputPlugins>
    <output>
        <name>Dummy output</name>
        <plugin>dummy</plugin>
        <params>
           <delay>0</delay>
           <stats>true</stats>
        </params>
    </output>
  </outputPlugins>
</ipfixcol2>
cosmostaylor commented 3 months ago

We tried the suggestion but unfortunately the result is the same.

Yes, we also think that the problem is with the UDP. That is why we checked the libraries etc. We were stuck and couldn't find any clues :(

cosmostaylor commented 3 months ago

Maybe a dumb question but, is it possible that the udp library coming with ipfixcol2 is problematic so that it binds to the socket but can not listen/poll packets from the NIC?

sedmicha commented 3 months ago

Maybe a dumb question but, is it possible that the udp library coming with ipfixcol2 is problematic so that it binds to the socket but can not listen/poll packets from the NIC?

We have had ipfixcol2 deployed in production for years and never ran into this issue, so I'd say it's unlikely that the UDP input library is the core of the problem. I even spun up a RHEL 7.9 VM just now to try myself, but wasn't successful in replicating the issue.

My best idea at this point is that you have some security hardening measures enabled on your system that interferes with some of the syscalls that the UDP plugin uses. The main difference I could spot between the way UDP is implemented in ipfixcol and ipfixcol2 is that ipfixcol only uses the basic socket API, meanwhile ipfixcol2 relies on epoll.

cosmostaylor commented 3 months ago

Thanks for the comments Michal we will keep digging. In order for us to understand deeper and troubleshoot, under what conditions we receive "The instance holds information about 0 active session(s)."?

I understand that if the ipfixcol2 does not receive any data from the NIC, we receive this log. Might there be any other reason? Can this be due to timers etc?

sedmicha commented 3 months ago

Thanks for the comments Michal we will keep digging. In order for us to understand deeper and troubleshoot, under what conditions we receive "The instance holds information about 0 active session(s)."?

I understand that if the ipfixcol2 does not receive any data from the NIC, we receive this log. Might there be any other reason? Can this be due to timers etc?

If all you're seeing is the message "The instance holds information about 0 active session(s)" with no errors being logged, I'd say that the only possibility is that the call that's waiting for data to be available for reading on any of the sockets never gets anything: https://github.com/CESNET/ipfixcol2/blob/dd0b57b59ae1530fd2bec7c44cddd1d6985532df/src/plugins/input/udp/udp.c#L1108

If it received any data but there was any error while processing it, or if there was any error while setting up the socket, the error would get logged. So either no data are getting to the socket, or they are but their presence isn't being picked up by epoll for some reason.

I would check /var/log/messages and /var/log/audit/audit.log for any SELinux denials and try setting SELinux to permissive with setenforce 0 and see if that changes anything.

It would also be helpful if you can provide strace log of your run with minimal config:

cosmostaylor commented 3 months ago

Hi Michal,

For some reason (which we don't know but you probably know), net.core.rmem_max parameter was causing the issue. Originally we had net.core.rmem_max to be 1342177280. After reducing this value to 516777216 ipfixcol2 started reading the socket. We had this value set to a higher value due to the packet drops we had. Now, we have to reduce it to have ipfixcol2 working... Can you please enlighten us a bit why ipfixcol2 is working with a smaller rmem? How shall we calculate this value?

sedmicha commented 3 months ago

Huh... That is certainly unexpected. I haven't looked into it too much so far, but apparently setsockopt/getsockopt doubles the value

SO_RCVBUF Sets or gets the maximum socket receive buffer in bytes. The kernel doubles this value (to allow space for bookkeeping overhead) when it is set using setsockopt(2), and this doubled value is returned by getsockopt(2). The default value is set by the /proc/sys/net/core/rmem_default file, and the maximum allowed value is set by the /proc/sys/net/core/rmem_max file. The minimum (doubled) value for this option is 256.

and 1342177280 * 2 exceeds INT_MAX, and the value is stored as an int, so my initial guess is that this is the problem, which is probably a bug on our side with something silently overflowing.