databricks / spark-tfocs

A Spark port of TFOCS: Templates for First-Order Conic Solvers (cvxr.com/tfocs)
Apache License 2.0
89 stars 37 forks source link

TFOCS for Spark: a Community Port of TFOCS for Apache Spark

This package is an implementation of the TFOCS convex solver for Apache Spark.

The original Matlab TFOCS library provides building blocks to construct efficient solvers for convex problems. TFOCS for Spark implements a useful subset of this functionality, in Scala, and is designed to operate on distributed data using the Spark cluster computing framework. TFOCS for Spark includes support for:

The name "TFOCS" is being used with permission from the original TFOCS developers, who are not involved in the development of this package and hence not responsible for the support. To report issues or request features about TFOCS for Spark, please use our GitHub issues page.

LASSO Example

Solve the l1 regularized least squares problem 0.5 * ||A * x' - b||_2^2 + lambda * ||x||_1 (lasso linear regression):

import org.apache.spark.mllib.linalg.{ DenseVector, Vectors }
import org.apache.spark.mllib.optimization.tfocs.SolverL1RLS

// Design matrix
val A = sc.parallelize(Array(
  Vectors.dense(0.61, 0.98, 0.32),
  Vectors.dense(0.10, 0.22, 0.92),
  Vectors.dense(0.79, 0.02, 0.20)), 2)

// Observations
val b = sc.parallelize(Array(3.69, 3.36, 1.59), 2).glom.map(new DenseVector(_))

// Regularization term
val lambda = 0.1

SolverL1RLS.run(A, b, lambda)

Alternatively, the above optimization may be performed using the TFOCS optimizer directly rather than via the SolverL1RLS helper:

import org.apache.spark.mllib.optimization.tfocs.fs.dvector.double._
import org.apache.spark.mllib.optimization.tfocs.fs.vector.double._
import org.apache.spark.mllib.optimization.tfocs.fs.vector.dvector._
import org.apache.spark.mllib.optimization.tfocs.TFOCS
import org.apache.spark.mllib.optimization.tfocs.vs.dvector._
import org.apache.spark.mllib.optimization.tfocs.vs.vector._

// Initial x vector
val x0 = Vectors.zeros(3).toDense

TFOCS.optimize(new SmoothQuad(b), new LinopMatrix(A), new ProxL1(lambda), x0)

Linear Program Example

To solve the smoothed standard form linear program:

minimize c' * x + 0.5 * mu * ||x - x0||_2^2
s.t.     A' * x == b' and x >= 0
import org.apache.spark.mllib.linalg.{ DenseVector, Vectors }
import org.apache.spark.mllib.optimization.tfocs.SolverSLP

// Constraint matrix
val A = sc.parallelize(Array(
  Vectors.sparse(3, Seq((0, 0.88))),
  Vectors.sparse(3, Seq((1, 0.63))),
  Vectors.sparse(3, Seq((0, 0.29), (2, 0.18)))), 2)

// Constraint vector
var b = new DenseVector(Array(9.50, 6.84, 5.09))

// Objective vector
val c = sc.parallelize(Array(1.0, 2.0, 3.0), 2).glom.map(new DenseVector(_))

// Smoothing parameter
val mu = 1e-2

SolverSLP.run(c, A, b, mu)

Solvers

Software Architecture Overview

The primary types used in the TFOCS for Spark library are as follows:

The primary abstractions of the TFOCS for Spark library are as follows:

The following naming conventions are used in this library:

TODOs