genodelabs / genode

Genode OS Framework
https://genode.org/
Other
1.08k stars 254 forks source link

nic_bridge/nic_router: allocated MAC addresses not freed from Mac_allocator #2470

Closed BorisMulder-CSL closed 5 years ago

BorisMulder-CSL commented 7 years ago

Whenever a component is started and stopped that uses a nic session from a nic_bridge, a new mac address is assigned each time that component creates its session, even when that component has a policy in nic_bridge for a static ip. This pool of mac adresses is never released and after 2^8 allocations the nic_bridge will fail and not give any other sessions.

This can be verified by running this script that starts and stops fetchurl for a while: init_fetchurl.run.txt

chelmuth commented 5 years ago

I adapted the issue title as nic_router also does not call Mac_allocator:free().

m-stein commented 5 years ago

@chelmuth Thanks that you have pointed out this significant issue in the NIC router!

m-stein commented 5 years ago

I'll try to fix both components these days.

m-stein commented 5 years ago

Fixed: ceab90e3d6 nic_router/nic_bridge: free MAC addresses

The commit also adds a new test component named "nic_stress". In contrast to "net_flood" (which I'd like to rename "net_stress" in the future) the "nic_stress" component aims for low-level NIC interactions without considering network protocols while the "net_stress" component aims for the corner cases in the field of common network protocols. Currently the "nic_stress" component only tests the creation and destruction of loads of NIC sessions.

There are two new tests "nic_bridge_stress" and "nic_router_stress" that are added to the autopilot list.

m-stein commented 5 years ago

Here's a fix-up that fixes the test author and moves the run scripts to os: 8a63ec49a6 Fixup "nic_router/nic_bridge: free MAC addresses" (author, test repo)

m-stein commented 5 years ago

When removing that MAC addresses get freed in the NIC router, the new test also reveals a fault in the NIC router as soon as the limit of MAC addresses is reached. But this should be handled in a dedicated issue.

m-stein commented 5 years ago

Had to rebase: 60935f9a5f Fixup "nic_router/nic_bridge: free MAC addresses" (author, test repo) e3a6eec00a nic_router/nic_bridge: free MAC addresses

chelmuth commented 5 years ago

WIth the two new *_stress autopilot test we also got 15 new test timeouts last night. @m-stein what is your suggestion to reduce the noise? Please keep in mind that the total amount of failing nic tests was 33 last night.

m-stein commented 5 years ago

I've spend some hours debugging the nic_router_stress test on several platforms. One problem is the destruction of an undissolved signal context during the constructor of the Packet_stream_source while creating a new NIC session. But I haven't found a fix for this so far.

m-stein commented 5 years ago

@nfeske These two should fix the nic_*_stress issues: 28770f1eac Fixup "nic_router/nic_bridge: free MAC addresses" (nic_router_stress: fix sel4/foc/fiasco) 5e375ec803 Fixup "nic_router/nic_bridge: free MAC addresses" (nic_stress: handle exception)

m-stein commented 5 years ago

@nfeske I forgot this one: 0fdb30d181 Fixup "nic_router/nic_bridge: free MAC addresses" (nic_bridge_stress: fix sel4/foc/fiasco)

nfeske commented 5 years ago

Thanks a lot @m-stein! I merged the 3 fixups to staging.

m-stein commented 5 years ago

Only on fiasco+x86_32+hardware nic_bridge_stress is still failing:

[2019-03-27 05:22:28] [init -> nic_stress_1] round 22/22 nic 10/11 mac 02:02:02:02:42:08
[2019-03-27 05:22:28] [init -> nic_stress_1] round 22/22 nic 11/11 mac 02:02:02:02:42:09
[2019-03-27 05:22:28] [init -> nic_stress_2] round 16/16 nic 1/16 mac 02:02:02:02:42:00
[2019-03-27 05:22:28] [init -> nic_stress_1] --- finished NIC stress test ---
[2019-03-27 05:22:28] [init] child "nic_stress_1" exited with exit value 0
[2019-03-27 05:22:28] [init] Error: ipc_reply_and_wait error 0x10
[2019-03-27 05:22:28] [init -> nic_stress_2] round 16/16 nic 2/16 mac 02:02:02:02:42:01
[2019-03-27 05:27:21] Error: Test execution timed out

I guess it's something with parent.exit(). If so, I plan to simply circumvent the call of parent.exit() on fiasco.

m-stein commented 5 years ago

@nfeske This fix should solve the above mentioned problem: c3f0f522de Fixup "nic_router/nic_bridge: free MAC addresses" (nic_bridge_stress: fix fiasco ipc error)

nfeske commented 5 years ago

Thanks for the fixup, which I merged to staging just now. Quirks like this are unfortunate but now this special case is documented in the run script, which is nice. The code de-duplication is all the better.

m-stein commented 5 years ago

:-)