Open kohillyang opened 4 years ago
Hi @kohillyang , I think it is not related to MXNet.
When there is a new connection, the library flask will create a new python process to handle the connection, which creates a new copy of MXNet instance predictor
.
To validate it, you can print the id of the predictor by ~print(id(r))
~ print(id(predictor))
in the function def net_forward():
@wkcn but even if flask has created a new process, the GPU memory should be freed once the process ends. And the predictor is created in the main function, which should only be called once and has only one predictor instance. On the other side, if the main process has initialized a CUDA environment, the mxnet in the subprocess will fail when inference because their CUDA file descriptor can not be shared between the main process and the sub-process.
BTW. , the pid of the process and the id of the predictor remain unchanged. I print them using the following codes:
print(id(self))
print(os.getpid())
PS: ctx.empty_cache()
is also not thread-safe. If you called it in two threads, the program would crash in some cases.
Thread-safe is of importance because in some time you need to implement a Block with asnumpy, and it is too hard to implement all blocks as HybridBlock and as an asynchronous way. In pytorch it is not a problem because we have DataParallel. It will start a thread for each CPU instance and gather the results, but this operation is not officially supported by mxnet because at least there are something like https://github.com/apache/incubator-mxnet/issues/13199 which need workarounds.
@wkcn predictor is created by predictor = Predictor()
, not r = predictor()
, since its children function __call__
is override. And the memory usage grows slow, it seems that it is because the memory allocated by the line mx.nd.zeros(shape=(1, 3, max_h, max_w), ctx=ctx)
is not freed.
@leezu could this be related to the issue fixed by #18328 #18363?
@kohillyang so you are creating a new predictor in every HTTP call? Thus yes, a new Block is created in every HTTP call and due to https://github.com/apache/incubator-mxnet/pull/18328 the parameter of the Block won't be deallocated.
https://github.com/apache/incubator-mxnet/pull/18328/files only contains Python changes. Would you like to try applying the changes to your MXNet files and see if the memory leak goes away. Thank you
Why do you think I'm creating a new predictor in each call? Apparently there is only one instance for Predictor.
Nevermind, I didn't read your def net_forward()
carefully enough
Thus this is unrelated to #18328
Description
Hello, I'm using flask with mxnet to write a server. Since it is a web app, we want the GPU memory is fully static allocated. However, as the title said, I found the GPU memory usage keeps increasing and then raise a OOM when the version of mxnet is 1.6.0post0 and 1.7.0, and if you are using mxnet 1.5.1, then all things are good. Since Flask debug mode uses multi-threading, I think it may be caused by some calls which are not thread-safe.
To Reproduce
This is a naive fLask server:
And just run the following code to request the server:
Environment
I'm using flask 1.0.2 and tornado 5.1, but I think it is independent of the versions of flask and tornado. We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
paste outputs here