Open jsonn opened 1 year ago
How many macvlan containers are we talking about? Do you know how long your DHCP lease time is?
16 container ATM, 10 minutes.
Ok I think that explains why it leaks so fast then. I think we spawn a new thread for each lease but somehow the code does not cleanup the old one so we leak the old thread. I take a look.
Any news?
No, I haven't found the time to reproduce this issue.
I can take a look at this issue. Can someone point me in the right direction to reproduce this?
Use macvlan and a DHCP server with as short a lease as reasonable, e.g. a minute. Observe the number of threads?
yes checking ls /proc/$pidOfProxy/task/
over time should show the leak I guess
I am now able to replicate. I started 10 containers on a network where the lease is only 60 seconds. In my case, the nv dhcp-proxy PID is 6808
and after a short while:
Threads: 552
Ah, just noticed this issue. Could this be related? My DHCP lease time is 30 mins.
https://github.com/containers/netavark/issues/1024
Thanks!
I definitely have this thread leak, there were 13708 threads for ~15 containers after 3 days of running - and I was also seeing #618 as a symptom (I assume, of thread starvation). I have the underlying pattern (IPv6 multicast on IPv4 network)
I updated past the fix for that specific symptom and I'm watching how many threads it creates long-term
My thread leak seems "better, but not totally fixed". I have 1497 threads after 6 days (post #1022) versus the 13708 after 3 days.
Importantly the dhcp-proxy is not spinning CPU right now and my core symptom (restarting containers sometimes had dhcp task aborts) is gone
Using SuSE MicroOS with a bunch of macvlan-using containers, I see netvark-dhcp-proxy hanging every few days. From journalctl:
Even with
RUST_BACKTRACE=1
set, it doesn't give a backtrace. Last time this happened, ps reported over 4000 threads for the PID.