RakipInitiative / ModelRepository

Joint project of EFSA, Federal Institute For Risk Assessment, DTU and ANSES to create a online model repository.
GNU General Public License v3.0
2 stars 0 forks source link

R library management ideas #291

Open schuelet opened 3 years ago

schuelet commented 3 years ago

Currently, FSK-Lab adds the .fsk folder to the .libPaths of the user's system, which is often the reason, users experience issues with their own R instances. I think, a few changes could solve this problem:

  1. I think we can avoid problems with conflicting R versions by forcing our FSK-Lab provided R to only use the .fsk folder for their source of libraries (and the system library) but not use the user libraries at all.
  2. Restoring .Rprofile: In de.bund.bfr.knime.fsklab.r.server.RConnectionFactory the function restoreProfile sets back .Rprofile to its original state, but only after the VM is shut down. I propose to restore it sooner, after the last R model has been executed (if several are running at the same time). I worry, that otherwise users who open RStudio with their own R will experience package problems with packages that were built with an old R version (ours) AND mess up .fsk if they update them for their RStudio session.

What do you think? Maybe this solves our ongoing issues with conflicting R versions for good. And I don't think it would be too difficult to implement. Since we are in a testing phase, maybe now is the right time.

mfilter commented 3 years ago

I like the idea. Let's assign it to @ahmadswaid

llavall commented 3 years ago

Please always consider conda users. I don't know if that requires different approaches.

miguelalba commented 3 years ago

This is why we have this approach in place. I am pasting what I wrote Thomas:

.Rprofile is still needed when using multithreaded (or multi process) models. When you run a single model you can configure first .libPaths(), but when a new process is spawned by foreach, this process is independent and not configured to use .libPaths() to the .fsk folder. Still, every R process looks for the default config files, so .Rprofile points these child processes to the .fsk folder with the packages. In other words, if you remove .Rprofile we break Tasja's model

This makes clear the need of .Rprofile (unless we find a way to configure R child processes without changing model scripts). The approach of modifying and restoring .Rprofile before and after each model execution has the upside of reducing the conflict with RStudio and other R clients but it introduces a racing condition. This is a common challenge in concurrent implementations where several actors compete for a shared resource. In this case multiple runners may try to modify .Rprofile at the same time.

schuelet commented 3 years ago

@miguelalba thats why I said after the LAST model has been executed. I figure, we can use an observer/listener to check if an R model is currently executed. Would that not work?

miguelalba commented 3 years ago

That would not be enough. It is also necessary to control when the .Rprofile is created or modified if already existing (when starting runners). I am not sure how I would implement it with an observer pattern or if it is the best approach. For example in an observer pattern an observable (weather station) asynchronously updates a numbers of observers (clients, phone apps) etc with updated weather conditions. We could do with an observer implementation without sharing state (this is why I say it might not be the best approach). The observable would be some wrapper or handler class for the .Rprofile file (RprofileManager) that is observed by a number of runner nodes. Every time a runner starts it would subscribe itself to RprofileManager and unsubscribe itself when finished. The subscribe method would perform an event or check so that when the list of observers is not empty for the first time it would modify .Rprofile while the unsubscribe method would restore it the first time the list is empty.

And this is pretty much what Ahmad would need to do :)

schuelet commented 3 years ago

I will put this on ice, we don't have the resources to do this right now