Shark-ML / Shark

The Shark Machine Leaning Library. See more:
http://shark-ml.github.io/Shark/
GNU Lesser General Public License v3.0
493 stars 130 forks source link

Shark is a fast, modular, general open-source C++ machine learning library.

Shark is licensed under the GNU Lesser General Public License, please see the files COPYING and COPYING.LESSER, or visit http://www.gnu.org/licenses .

Any application of the SHARK code toward military research and use is expressly against the wishes of the SHARK development team.

INSTALLATION / DOCUMENTATION

The entry point to the Shark library documentation is located at doc/index.html . For installation instructions, please click on "Getting started" on that page. Short version of installation guide: issue "ccmake ." in the main directory to select your build options, and afterwards issue "make" in the main directory -- you should be done (assuming Boost and CMake were installed). See the documentation for detailed instructions.

BUILDING THE DOCUMENTATION: To build the documentation yourself (e.g., if you need to read it locally in order to install it, i.e., because you don't have internet), see doc/README.txt

FILE STRUCTURE

README.txt This file (residing in the root directory of the Shark library).

CMakeLists.txt Definitions for the CMake build system.

include/ This directory and its sub-directories hold all include files of the library. Note that some functionality is implemented in lower- level Impl/ folders and inline .inl files.

lib/ The Shark library is placed in this directory. In the source code distribution this directory is initially empty, and the library is placed into the directory as the results of compilation. Binary distributions already contain the library, pre-built in release mode.

doc/ All documentation files are found in this sub-directory. In packaged versions of Shark the html documentation is pre-built; the repository provides the corresponding sources. The documentation contains technical reference documents for all classes and functions as well as a collection of introductory and advanced tutorials.

doc/index.html Entry point to the Shark documentation.

examples/ The examples directory contains example use-cases of the most important algorithms implemented in Shark. Besides exemplifying powerful learning algorithms, these programs are intended as starting points for experimentation with the library. The executables corresponding to the C++ example programs are found in examples/bin/.

Test/ Shark comes with a large collection of unit tests, all of which reside inside the Test directory.

bin/ The binaries of the Shark unit tests are placed here. Once the CMake build system is set up (with the "ccmake" command or equivalent) the whole test suite can be executed with the command "make test", issued in the Shark root directory.

src/ Source files of the Shark library. Note that from Shark version 3 onwards large parts of the library are templated and therefore header-only.

gpl-3.0.txt GNU general public license, version 3.

Note: Depending of the type of Shark distribution (binary or source package, or current repository snapshot) not all of theses files and directories are present.

PACKAGE STRUCTURE

The organization of the include/ directory reflects the structure of the Shark library. It consists of the following modules:

GENERAL INFRASTRUCTURE:

LinAlg Data structures and algorithms for typical linear algebra computations. For (dense and sparse) vector and matrix classes Shark relies on Boost uBLAS. Many higher level algorithms (such as singular value decomposition) are still implemented by the library itself.

Statistics This component is new in Shark 3. It wraps the capabilities of Boost accumulators, and it provides tools that appear regularly in machine learning, such as the Mann-Whitney U-test (also known as the Wilcoxon rank-sum test).

LEARNING INFRASTRUCTURE:

Core The core module is the central place for all top-level interfaces. In addition it holds a few infrastructure classes, such as exceptions.

Data The data module hosts data containers that have been specifically designed for the needs of machine learning code. Also, data can be imported and exported from and to different standard machine learning data file formats.

MACHINE LEARNING:

Models Models are adaptive systems, the architectures on top of which (machine) learning happens. Shark features a rich set of models, from simple linear maps to (feed-forward and recurrent) neural networks, support vector machines, and different types of trees. Models can also be concatenated with data format converters and other models.

ObjectiveFunctions This module collects different types of cost, fitness, or objective functions for learning. The bandwidth includes data-dependent error functions based on simple loss functions, cross-validation, area under the ROC curve, and different objectives used for model selection.

Algorithms All actual learning algorithms reside in this module. There are two main groups of learning algorithms, namely iterative optimizers and more specialized model trainers. General optimizers are organized into direct search and gradient-based optimization. Specialized algorithms for linear programming (a part of GLPK, the GNU linear programming kit) and quadratic programming for training of non-linear support vector machines are included. Shark also ships with algorithms for efficient nearest neighbor search.

Unsupervised This module contains the Shark implementation of restricted Bolzmann machines (RBMs), a recent experimental feature of Shark.

We use external libraries, images and code: