Closed stv0g closed 1 year ago
In GitLab by @ghost on Aug 30, 2017, 15:32
Before you run VILLASnode, run the following:
ulimit -c unlimited
There can be two reasons for this:
I already checked the firewall for this on acs-villas which is okay. Sporadically failed name resolution is hard to debug.
@mstevic Does this problem only occur during startup of VILLASnode or after running it for a while?
Steffen
In GitLab by @ghost on Aug 30, 2017, 21:11
It happens after running it for a while, e.g. in this case the last ts was printed for 1033.417 seconds.
But this is not deterministic across the runs.
Okay, then I can eliminate the name resolution as the cause as it is only done during the startup.
Okay I think we found the problem: its because we don't properly implement Transmit Pacing (#122).
This means, that Linux will discard packets if we start to send data a unreasonable rates. I assume that the network gets congested, Linux has problems to deliver packets and reports this to us by throwing this error.
The easiest way to solve this problem would be to ignore those errors. Some packets will be lost then. In the long term, we should implement some mechanism to adjust the sending rate depedinging on the conditions.
More details are here: https://groups.google.com/forum/#!topic/comp.protocols.tcp-ip/Qou9Sfgr77E
mentioned in commit ce29e87769421e29258c17d40e0644d47e18795a
In GitLab by @ghost on Sep 23, 2017, 17:23
Hi @stvogel
Today during testing with POLITO and INL, acs-villas chrashed multiple times with the error:
2899.493 Error t Failed send to node inl-villas(socket): Invalid argument (22)
Aborted (core dumped)
In GitLab by @ghost on Sep 24, 2017, 18:07
Hi @stvogel
Which version do we need to avoid this error?
I had the error again, but not sure if I have update with this fix
mentioned in commit b03748ac2c264f30c455878f28bf9a71b4cff893
The problem should have been fixed with version 0.4.3
On which machine do you get the error?
mentioned in commit 3c27971cc7c382046f67dc10625d0a8c8a8255a2
Interestingly, we see the same error with another netowrking tool called Nmap on acs-villas
Starting Nmap 7.60 ( https://nmap.org ) at 2017-10-23 10:56 CEST
sendto in send_ip_packet_sd: sendto(9, packet, 44, 0, 134.130.169.102, 16) => Operation not permitted
Offending packet: TCP 134.130.169.32:59076 > 134.130.169.102:554 S ttl=42 id=45019 iplen=44 seq=2914653643 win=1024 <mss 1460>
sendto in send_ip_packet_sd: sendto(9, packet, 44, 0, 134.130.169.117, 16) => Operation not permitted
This makes me believe that VILLASnode is not the root cause of this problem.
After disabling the firewall systemctl stop firewalld
, the errors were gone.
So it is likely a firewall issue :(
I will close this issue, as we have not been able to reproduce it in more recent versions.
closed
In GitLab by @ghost on Aug 28, 2017, 23:31
VILLASnode crashes during running with the following error:
It happens very rarely on other hosts, but it happened several times on acs-villas within 1 hour of testing. VILLASnode version on acs-villas:
v0.3.4-02e03a8-release (built on Jul 28 2017 16:39:50)
, but other hosts hadv0.3.3
.It is difficult to reproduce this issue. I will try to update the issue with as many info as I can get.