databricks / spark-tfocs

A Spark port of TFOCS: Templates for First-Order Conic Solvers (cvxr.com/tfocs)
Apache License 2.0
89 stars 37 forks source link

How efficient it should be in calculating linear programming problems? #33

Open BloomSkyTree opened 5 years ago

BloomSkyTree commented 5 years ago

Hello! First of all, I should apologize for my poor English.
I've tried to use SolverSLP to solve a large scale linear programming problem on Spark(2.4.0 with Scala version 2.11.12). With some a few minor changes (about the logger, which had moved to org.apache.spark.internal.Logging since Spark 2.0), I successfully compiled the program and proceeded to practice. It performed well, and gived me the right answer. But curiously, it take minutes to solve a problem with about 586 variable and 200+ equality constrains in local test (which only takes seconds when using a non-distributed python LP solver) . Is there some misunderstanding? How can I play the real performance of tfocs? P.S : the BLAS and LAPACK are enabled.

staple commented 5 years ago

Hi BloomSkyTree, since Spark has a general purpose distributed architecture there will definitely be overhead compared to a special purpose application running in memory on a single system. In addition a specialized LP solver may potentially provide better performance than a general purpose optimizer. TFOCS for Spark is primarily useful when it is infeasible to keep data in memory on a single system, when data is already in Spark, or if the benefit of cpu parallelism for a distributed task outweighs the task’s communication overhead (which for a task with many communication steps such as LP solving in TFOCS for Spark suggests a large data size).