bedatadriven / renjin

JVM-based interpreter for the R language for the statistical analysis.
https://www.renjin.org
GNU General Public License v2.0
514 stars 82 forks source link

The 'optim' function with 'L-BFGS' method fails in multithreading #448

Open tlucz opened 5 years ago

tlucz commented 5 years ago

Hello,

We faced a problem with multithreading in Renjin. The 'optim' function with 'L-BFGS' method calculates different results when is invoked from many threads.

A simple Java project was prepared to reproduce the problem. Also the code is attached at the bottom of this mail.

https://github.com/tlucz/parallel-optim

When we run the code with THREADS_NUMBER=1 we get following correct results (every time the same):

list(par = c(0,99980003783379, 0,99960011579784), value = 3.9984867919192357E-8, counts = c(49L, 49L), convergence = c(0L), message = CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH)
list(par = c(0,99980003783379, 0,99960011579784), value = 3.9984867919192357E-8, counts = c(49L, 49L), convergence = c(0L), message = CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH)
list(par = c(0,99980003783379, 0,99960011579784), value = 3.9984867919192357E-8, counts = c(49L, 49L), convergence = c(0L), message = CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH)

But when we run it with more threads e.g. THREADS_NUMBER = 2 we get different results for the calls and sometimes even an errors:

list(par = c(-1,02632082910288, 1,06079411634746), value = 24.199999999999996, counts = c(2L, 2L), convergence = c(52L), message = ERROR: ABNORMAL_TERMINATION_IN_LNSRCH)
list(par = c(-1,2, 1), value = 24.199999999999996, counts = c(2L, 2L), convergence = c(52L), message = ERROR: ABNORMAL_TERMINATION_IN_LNSRCH)
list(par = c(-1,2, 1), value = 0.0, counts = c(0L, 0L), convergence = c(52L), message = ERROR: ABNORMAL_TERMINATION_IN_LNSRCH)
list(par = c(0,37771903024192, 1,64396551742973), value = 225.77555649737445, counts = c(0L, 0L), convergence = c(0L), message = CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH)
list(par = c(0,99987487694407, 0,99974046721066), value = 2.4309119528957442E-8, counts = c(52L, 52L), convergence = c(0L), message = CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH)

Separate ScriptEngine is created for each thread as it is described in Renjin documentation.

code:

package com.example;

import javax.script.ScriptEngine;
import org.renjin.script.RenjinScriptEngineFactory;
import org.renjin.sexp.SEXP;

public class Example {

 private static final ThreadLocal<ScriptEngine> engines = new ThreadLocal<>();
 private static final int RUNS = 100;
 private static final int THREADS_NUMBER = 2;

 public static void main(String[] args) {
   new Example().execute();
 }

 private void execute() {
   for (int i = 0; i < THREADS_NUMBER; i++) {
     new Thread(this::task).start();
   }
 }

 private void task() {
   ScriptEngine engine = getEngine();
   try {
     engine.eval("fr <- function(x) { ## Rosenbrock Banana function\\n\" +\n"
     + " x1 <- x[1]\n"
     + " x2 <- x[2]\n"
     + " 100 * (x2 - x1 * x1) ^ 2 + (1 - x1) ^ 2\n"
     + "}");
   } catch (Exception e) {
     System.err.println(e.getMessage());
   }

   for (int i = 0; i < Example.RUNS; i++) {
     try {
       SEXP result = (SEXP) engine.eval("optim(c(-1.2,1), fr, method = \"L-BFGS\")");
       System.out.println(result);
     } catch (Exception e) {
       System.err.println(e.getMessage());
     }
   }
 }

 private ScriptEngine getEngine() {
   ScriptEngine engine = engines.get();
   if (engine == null) {
     RenjinScriptEngineFactory factory = new RenjinScriptEngineFactory();
     engine = factory.getScriptEngine();
     engines.set(engine);
   }
   return engine;
 }
}
akbertram commented 5 years ago

it is most likely due to the use of global variables in the C code we have imported from GNU R. I thought we had rewritten offending code, but will need to check on this.

akbertram commented 5 years ago

Indeed, one of the support routines declares a large number of static local variables that we missed: https://github.com/bedatadriven/renjin/blob/fadaeddd657d3afb7ac6686f289ce0b37e7ee8ec/packages/stats/src/main/c/lbfgsb.c#L320

I will see if it's easier to refactor the many many global variables into a working data structure, or if we can use the global variable transformer that has worked so well with the graphics package.

tlucz commented 5 years ago

Hi, Thanks for a quick reaction. When do you plan to release a new version with this fix?

tlucz commented 4 years ago

Hello, Is there a plan to merge the fix?