OpenSmalltalk / opensmalltalk-vm

Cross-platform virtual machine for Squeak, Pharo, Cuis, and Newspeak.
http://opensmalltalk.org/
Other
547 stars 110 forks source link

Socket handle leak on Linux VMs #684

Open dtlewis290 opened 1 month ago

dtlewis290 commented 1 month ago

Open socket handles accumulate in /proc//fd for an image running an active SqueakSource server. Open handles accumulate gradually, eventually leading to image lockup when the Linux per-process 1024 handle limit is reached. /usr/bin/ss shows an accumulation of sockets in CLOSE_WAIT status, fewer than the handles in /proc//fd list but presumably associated with TCP sessions for sockets not properly closed from the VM.

Issue observed in a 5.0-202312181441 VM, and is not present in a 5.0-202004301740 VM. Other Linux VMs later than 5.0-202312181441 are likely affected, although this has not been confirmed. See also discussions on the box-admins Slack channel.

dtlewis290 commented 1 month ago

If anyone has experience with this issue on Linux VMs, or if you have any insight as to possible causes, I would appreciate the feedback. I am able to do some limited validation of VMs on the the squeaksource.com server but I need to be very careful to avoid impacting users of that service, so suggestions or advice is welcome here.

dtlewis290 commented 1 week ago

I have been building VMs from different points in the commit history, and testing them on squeaksource.com for the socket descriptor leak.

I can now confirm that the problem is associated with (not necessarily caused by) the introduction of Linux EPOLL support in aio.c in October 2020:

commit 171c235451dd16fe6bb29329c3562b2c741f4b1d Author: Levente Uzonyi leves@caesar.elte.hu Date: Mon Oct 19 01:44:37 2020 +0200

VMs buit at this commit and later (merged at 5fea0e35d24cbda5d31e3b0faaf2c6223c030a26), including current VMs, have the socket handle leak problem.

VMs built from commits up through the immediately preceding commit (da7954d2e48d1471401ad85865b5f9a4af95cd12) do not have the socket leak.

I was also able to build and test a current VM with the EPOLL logic disabled (#define HAVE_EPOLL 0, #define HAVE_EPOLL_PWAIT 0). This VM does not have the handle leak problem.