VILLASframework / node

Connecting real-time power grid simulation equipment
https://fein-aachen.org/projects/villas-node/
Apache License 2.0
13 stars 7 forks source link

VILLASnode crashes during running #115

Closed stv0g closed 1 year ago

stv0g commented 7 years ago

In GitLab by @ghost on Aug 28, 2017, 23:31

VILLASnode crashes during running with the following error:

1033.417 Error t Failed send to node inl-villas(socket): Operation not permitted (1)                                                              
Aborted (core dumped)

It happens very rarely on other hosts, but it happened several times on acs-villas within 1 hour of testing. VILLASnode version on acs-villas: v0.3.4-02e03a8-release (built on Jul 28 2017 16:39:50), but other hosts had v0.3.3.

It is difficult to reproduce this issue. I will try to update the issue with as many info as I can get.

stv0g commented 7 years ago

In GitLab by @ghost on Aug 30, 2017, 15:32

Before you run VILLASnode, run the following:

ulimit -c unlimited
stv0g commented 7 years ago

There can be two reasons for this:

I already checked the firewall for this on acs-villas which is okay. Sporadically failed name resolution is hard to debug.

@mstevic Does this problem only occur during startup of VILLASnode or after running it for a while?

Steffen

stv0g commented 7 years ago

In GitLab by @ghost on Aug 30, 2017, 21:11

It happens after running it for a while, e.g. in this case the last ts was printed for 1033.417 seconds.
But this is not deterministic across the runs.

stv0g commented 7 years ago

Okay, then I can eliminate the name resolution as the cause as it is only done during the startup.

stv0g commented 7 years ago

Okay I think we found the problem: its because we don't properly implement Transmit Pacing (#122).

This means, that Linux will discard packets if we start to send data a unreasonable rates. I assume that the network gets congested, Linux has problems to deliver packets and reports this to us by throwing this error.

The easiest way to solve this problem would be to ignore those errors. Some packets will be lost then. In the long term, we should implement some mechanism to adjust the sending rate depedinging on the conditions.

More details are here: https://groups.google.com/forum/#!topic/comp.protocols.tcp-ip/Qou9Sfgr77E

stv0g commented 7 years ago

mentioned in commit ce29e87769421e29258c17d40e0644d47e18795a

stv0g commented 7 years ago

In GitLab by @ghost on Sep 23, 2017, 17:23

Hi @stvogel
Today during testing with POLITO and INL, acs-villas chrashed multiple times with the error:

2899.493 Error t Failed send to node inl-villas(socket): Invalid argument (22)
Aborted (core dumped)
stv0g commented 7 years ago

In GitLab by @ghost on Sep 24, 2017, 18:07

Hi @stvogel
Which version do we need to avoid this error?
I had the error again, but not sure if I have update with this fix

stv0g commented 7 years ago

mentioned in commit b03748ac2c264f30c455878f28bf9a71b4cff893

stv0g commented 7 years ago

The problem should have been fixed with version 0.4.3

On which machine do you get the error?

stv0g commented 7 years ago

mentioned in commit 3c27971cc7c382046f67dc10625d0a8c8a8255a2

stv0g commented 7 years ago

Interestingly, we see the same error with another netowrking tool called Nmap on acs-villas

Starting Nmap 7.60 ( https://nmap.org ) at 2017-10-23 10:56 CEST
sendto in send_ip_packet_sd: sendto(9, packet, 44, 0, 134.130.169.102, 16) => Operation not permitted
Offending packet: TCP 134.130.169.32:59076 > 134.130.169.102:554 S ttl=42 id=45019 iplen=44  seq=2914653643 win=1024 <mss 1460>
sendto in send_ip_packet_sd: sendto(9, packet, 44, 0, 134.130.169.117, 16) => Operation not permitted

This makes me believe that VILLASnode is not the root cause of this problem.

After disabling the firewall systemctl stop firewalld, the errors were gone. So it is likely a firewall issue :(

stv0g commented 5 years ago

I will close this issue, as we have not been able to reproduce it in more recent versions.

stv0g commented 5 years ago

closed