RakipInitiative / ModelRepository

Joint project of EFSA, Federal Institute For Risk Assessment, DTU and ANSES to create a online model repository.
GNU General Public License v3.0
2 stars 0 forks source link

Runner Bug: Rserve doesn't end after Runner completes execution. #299

Closed schuelet closed 3 years ago

schuelet commented 3 years ago

The process that executes R code does not finish after the execution is done. Only on reset is Rserve cleansed. It might even be possible, that a job on the server that has finished, will still have created a permanent Rserve instance, clogging up the memory.

I am assigning myself, @ahmadswaid and @miguelalba to investigate and find a solution to force end Rserve after model execution. @llavall and @mfilter you are assigned to watch this ticket but you can take yourself off it if you like.

image.png

ahmadswaid commented 3 years ago

@schuelet I have worked before on the issue and created some POC project where the solution is based on getting the system id assigned to the proccess that run R when we start running any R script and that ID is used later when the R script finished running. that worked fine on windows but faced some problem on linux. I think we build on top of this Idea.

schuelet commented 3 years ago

@ahmadswaid if that turns out to be too difficult, we could modify the Runner to re-use the RHandler and PythonHandler, i.e. if Rhandler already exists -> don't create a new one, clean up workspace and just use that one

schuelet commented 3 years ago

Note: To find out if a workflow is running on the KNIME server, this function will provide the answer. (thanks @miguelalba ) org.knime.core.util.EclipseUtil.determineServerUsage()

ahmadswaid commented 3 years ago

I find out that all the processes originated during the runner execution are being gracefully terminated after 60 seconds when they are not in use anymore i.e. after the execution of the scripts is finished and this time is given to it to do clean up before being terminated. this behaviour is the same in Windows, MacOS, Linux and KNIME server. I don't think the observation mentioned in this ticket is the origin of the problem which happend on the server.

schuelet commented 3 years ago

This has been fixed in https://github.com/SiLeBAT/FSK-Lab/pull/928.