Open justinc1 opened 7 years ago
Running the same shared object with osv::run(std::vector
The problem with all these dead threads is that their stack is still alive. Our default pthread stack size is 1MB (for non-pthread threads, we have a lower default, 64 KB). So two thousand of these dead threads with 1MB stacks will indeed fill the memory.
I'll see why these threads are not being marked detached as they should. Additionally I'll open a low priority issue (just so we'll remember) that dead threads could have their stacks be deleted.
@justinc1 I'm starting to suspect that it wasn't httpserver's fault for starting (and not detaching or joining) all these threads, but rather it is your own application which starts a thread and then neither marks it detached nor joins it. Is that possible?
One of the consequences of OSv's lack of isolated processes, is that OSv cannot offer "cleanup" services when an application's main() exists... If this main() left behind un-free()ed memory, un-close()ed file descriptors, or in this case un-join()ed threads, they will be left and if the same application is repeatedly run, memory will leak until we run out of it. I don't think this is something we can (or want to) fix in OSv.
So please check if your application creates threads and forgets to detach or join them. If not please let me know and I'll continue to investigate.
The test app doesn't create any new threads. Simpler testcase (since I just have file already opened).
sleep.c, compiled to /sleep.so:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char **argv)
{
printf("PP In sleep.c main.\n");
int repeat=5, ii;
switch(argc) {
case 2:
repeat = atoi(argv[1]);
break;
case 1:
break;
default:
printf("PP Usage: %s [repeat=5]\n", argv[0]);
return 0;
}
printf("PP cmd: %s repeat=%d.\n", argv[0], repeat);
for(ii=0; ii<repeat; ii++) {
sleep(1);
fprintf(stderr,"PP %d/%d ...\n", ii, repeat);
}
fprintf(stderr,"PP MAIN DONE\n");
return 0;
}
Then 5-times
curl -X PUT http://192.168.122.90:8000/app/?command=%2Fsleep.so%203
And gdb:
199 (0xffff800003c1f040) /cli/cli.so cpu0 waiting console::LineDiscipline::read(uio*, int) at drivers/line-discipline.cc:22 vruntime 134896
200 (0xffff800003ee7040) /sleep.so cpu0 terminated ?? at arch/x64/entry.S:113 vruntime 1.20252e-07
201 (0xffff800003fed040) /sleep.so cpu0 terminated ?? at arch/x64/entry.S:113 vruntime 0.000363523
202 (0xffff8000040f3040) /sleep.so cpu0 terminated ?? at arch/x64/entry.S:113 vruntime 0.00359571
203 (0xffff8000041fe040) /sleep.so cpu0 terminated ?? at arch/x64/entry.S:113 vruntime 0.0381669
204 (0xffff800004304040) /sleep.so cpu0 terminated ?? at arch/x64/entry.S:113 vruntime 0.231765
Turns out that in addition to the osv::run implementation run.cc, we have a second (!?) implementation of osv::run with slightly different paramters in app.cc! Unlike the former which runs the command and waits, the second runs it in the background - and never waits for nor detaches it. Moreover, it calls application::run() which stores this application in an apps array and we're supposed to call application::join() to remove it (which we never do). So the thread never gets joined, and also the library never gets unloaded.
Strangely httpserver calls that second osv::run variant. I have to get rid of it and clean up this mes...
never waits for nor detaches it
For single-threaded httpserver, the app::run() should not block. So can I guess that correct fix would be that httpserver starts app main() thread as detached?
In this thread - https://groups.google.com/d/msg/osv-dev/6UggBSYElJw/CVdNLlmABgAJ - I explained what causes this problem, and proposed a patch (titled "httpserver / osv::run_background: Don't keep apps alive forever"). The patch isn't quite correct, but it's a good start.
While trying to debug #824, I noticed this minor problem with apps started via httpserver. The app is started by osv::run(), and app object is never joined. So after about 500 cycles (the test code used to reproduce #824 - it does GET /os/version, POST /env/, PUT /app/?command=myapp.so), the server VM stopped responding (hitting enter key didn't show command prompt in console). I guess VM used most of memory for mmapping myapp.so.
This is not really a problem for me (I need to start less than 10 apps), but at least it is worth to mention it.
gdb: