clearcontainers / proxy

Hypervisor based containers proxy
Apache License 2.0
32 stars 15 forks source link

Unit test fail after update to go 1.9.2 #198

Closed jcvenegas closed 6 years ago

jcvenegas commented 6 years ago

The tests are failing because a file descriptor leak detected by https://github.com/clearcontainers/proxy/blob/4cd8d9320f1a8372060a20fe0b1399e91123af39/proxy_test.go#L133.

jcvenegas commented 6 years ago

Found in https://github.com/clearcontainers/proxy/pull/197

amshinde commented 6 years ago

@jcvenegas The file descriptor snapshots show a bunch of fds related to ksm. I disabled ksm_test.go and did not see any failures after running the tests in a loop.

amshinde commented 6 years ago

Here is a snaphot of the file descriptors when a test fails in 1.9.1:

Snapshot at the start of the test:
{Fd:0 Flags:32768 CloseOnExec:false Text:/dev/null}
{Fd:1 Flags:33794 CloseOnExec:false Text:/dev/pts/24}
{Fd:2 Flags:33794 CloseOnExec:false Text:/dev/pts/24}
{Fd:3 Flags:32770 CloseOnExec:true Text:/tmp/cc-ksm-meminfo676586189}
{Fd:4 Flags:2 CloseOnExec:true Text:anon_inode:[eventpoll]}
{Fd:5 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/run}
{Fd:6 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/sleep_millisecs}
{Fd:7 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/pages_to_scan}
{Fd:8 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/run}
{Fd:9 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/sleep_millisecs}
{Fd:10 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/pages_to_scan}
{Fd:11 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/run}
{Fd:12 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/sleep_millisecs}
{Fd:13 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/pages_to_scan}
{Fd:14 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/run}
{Fd:15 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/sleep_millisecs}
{Fd:16 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/run}
{Fd:17 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/sleep_millisecs}
{Fd:18 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/pages_to_scan}
{Fd:19 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/pages_to_scan}
{Fd:20 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/run}
{Fd:21 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/sleep_millisecs}
{Fd:22 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/run}
{Fd:23 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/sleep_millisecs}
{Fd:24 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/pages_to_scan}
{Fd:25 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/pages_to_scan}
{Fd:26 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/run}
{Fd:27 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/run}
{Fd:28 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/sleep_millisecs}
{Fd:29 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/pages_to_scan}
{Fd:30 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/sleep_millisecs}
{Fd:31 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/pages_to_scan}
{Fd:35 Flags:2050 CloseOnExec:true Text:socket:[1853393]}
{Fd:36 Flags:2050 CloseOnExec:true Text:socket:[1854618]}
{Fd:37 Flags:2050 CloseOnExec:true Text:socket:[1853598]}

Snapshot at the end of the test:
{Fd:0 Flags:32768 CloseOnExec:false Text:/dev/null}
{Fd:1 Flags:33794 CloseOnExec:false Text:/dev/pts/24}
{Fd:2 Flags:33794 CloseOnExec:false Text:/dev/pts/24}
{Fd:4 Flags:2 CloseOnExec:true Text:anon_inode:[eventpoll]}
{Fd:14 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/run}
{Fd:15 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/sleep_millisecs}
{Fd:19 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/pages_to_scan}
{Fd:20 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/run}
{Fd:21 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/sleep_millisecs}
{Fd:25 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/pages_to_scan}
{Fd:26 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/run}
{Fd:27 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/run}
{Fd:28 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/sleep_millisecs}
{Fd:29 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/pages_to_scan}
{Fd:30 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/sleep_millisecs}
{Fd:31 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test719958438/pages_to_scan}

As seen, there are less number of file descriptors at the end of the test. Our fd leak detector currently compares the snapshots line by line, which I think we need to modify to check if the file descriptors left at the end of the test are indeed greater than the file descriptors at the beginning. All these are related to ksm, I havent looked much into the ksm code yet to make complete sense of why these file descriptors are closed.

I have run the same tests with go 1.8.1, and have not seen any failures. See the same file descriptors at the start and end of each test. Here is the snapshot of fds during running test TestHyperstartResponse in 1.8.1 (which seems to be failing the most in 1.9.1)

Before:
{Fd:0 Flags:32768 CloseOnExec:false Text:/dev/null}
{Fd:1 Flags:33794 CloseOnExec:false Text:/dev/pts/24}
{Fd:2 Flags:33794 CloseOnExec:false Text:/dev/pts/24}
{Fd:3 Flags:32770 CloseOnExec:true Text:/tmp/cc-ksm-meminfo932665466}
{Fd:4 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:5 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:6 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:7 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:8 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:9 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:10 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:11 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:12 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:13 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:14 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:15 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:16 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:17 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:18 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:19 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:20 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:21 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:22 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:23 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:24 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:25 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:26 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:27 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:28 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:29 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:30 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:34 Flags:2 CloseOnExec:true Text:anon_inode:[eventpoll]}
{Fd:35 Flags:2050 CloseOnExec:true Text:socket:[1935385]}
{Fd:36 Flags:2050 CloseOnExec:true Text:socket:[1935387]}
{Fd:37 Flags:2050 CloseOnExec:true Text:socket:[1934459]}

After:
{Fd:0 Flags:32768 CloseOnExec:false Text:/dev/null}
{Fd:1 Flags:33794 CloseOnExec:false Text:/dev/pts/24}
{Fd:2 Flags:33794 CloseOnExec:false Text:/dev/pts/24}
{Fd:3 Flags:32770 CloseOnExec:true Text:/tmp/cc-ksm-meminfo932665466}
{Fd:4 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:5 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:6 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:7 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:8 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:9 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:10 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:11 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:12 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:13 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:14 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:15 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:16 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:17 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:18 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:19 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:20 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:21 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:22 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:23 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:24 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:25 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:26 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/run}
{Fd:27 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:28 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:29 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/sleep_millisecs}
{Fd:30 Flags:34818 CloseOnExec:true Text:/tmp/cc-ksm-test786189255/pages_to_scan}
{Fd:34 Flags:2 CloseOnExec:true Text:anon_inode:[eventpoll]}
{Fd:35 Flags:2050 CloseOnExec:true Text:socket:[1935385]}
{Fd:36 Flags:2050 CloseOnExec:true Text:socket:[1935387]}
{Fd:37 Flags:2050 CloseOnExec:true Text:socket:[1934459]}
amshinde commented 6 years ago

@sboeuf @jodh-intel @sameo

amshinde commented 6 years ago

Looks like we dont close all file descriptors while running the KSM tests. We dont check for fd leaks for the KSM tests, just for the tests in proxy_test.go. So, the open file descriptors should remain the same, but I am seeing them getting closed.