Build Statuses:
Atlas is a flexible Machine Learning platform that consists of a Python SDK, CLI, GUI & Scheduler to help Machine Learning Engineering teams dramatically reduce the model development time & reduce effort in managing infrastructure.
Atlas has evolved very rapidly and has gone though many iterations in Dessa's history.
The latest version is in BETA.
Here are few of the high-level features:
Official documentation for Atlas can be found at https://www.docs.atlas.dessa.com/
All docs are hosted on Read the Docs that track the docs
folder, please open a pull request here to make changes.
If you have questions that are not addressed in the documentation, there are several ways to ask:
foundations-atlas
tag.We will do our best to help!
We ❤️ contributors and would love to work with you.
Atlas is currently open to external contributors.
Follow this guide:
bug
feature-request
good first issue
label and get help from the community Slack if you need it. When you are ready, just follow the steps below in order to set up a development environment.
You will need to have docker
, yarn
, and the envsubst
command line tool on your machine in order spin up a local development environment.
OSX:
brew install docker
brew install yarn
brew install gettext
Ubuntu:
apt install docker
apt install docker-compose
apt install yarn
apt install gettext
For other Linux machines, replace apt install
with the equivalent command for your distributions package manager.
Clone this repository and enter the new directory
git clone git@github.com:DeepLearnI/atlas.git && cd atlas
Create and activate a brand new virtual environment with Python 3.7 then install the requirements. Some examples below.
conda create --name foundations python=3.7 && conda activate foundations
pip install -r requirements_dev.txt
pipenv --python 3.7 && pipenv shell
pipenv install
pipenv install -r requirements_dev.txt --dev --pre --skip-lock
python3 -m venv . && source bin/activate
pip install -r requirements_dev.txt
Add the packages that make up Atlas to your python path and set some environemnt variables by sourcing the activate_dev_env.sh
file.
. ./activate_dev_env.sh
Launch Atlas in development mode. This may take a while to pull some required docker images.
make devenv-start
You can now create a sample project by running the following command.
python -m foundations init my-project
Change into the newly created project directory and execute the following command to submit your first job. This can take a while the first time as one more image may need to be pulled.
python -m foundations submit scheduler . main.py
Navigate to localhost:3000
and verify that your newly created project exists on the frontend. Click on the project and verify that your job executed successfully.
Congrats! You are ready to go.
In order to run tests, simply run:
make unit-tests
make integration-tests
Last updated: March 2020
The following diagram shows a high level overview of how the Atlas system works. Note that Atlas's codebase evolves faster than this diagram and this diagram may not be kept upto date in real time. Still a good source to get a general understanding of the system.
“Atlas Server” is the term that we use to describe all of the services that allow Atlas to do its magic.
These services are as follows, note that some Atlas services live in other repos:
Let’s dive into each service with an explanation of their role within Atlas.
This is a custom built Python scheduler that launches Docker based workers. It uses APSchedulr to keep track of and run jobs that are queued in the system. Users can interact with the scheduler through a Flask-based RESTful API to submit and interact with jobs. Jobs are submitted in the form of a “job spec”, which is simply a Python dictionary that describes the makeup of the job that will run.
The scheduler can run in both GPU and non-GPU mode. GPU mode will keep track of available devices and, given a job with a provided number of GPUs, will allocate jobs according to available resources.
Contribute to the Scheduler repository here.
This is what the scheduler uses to run any submitted jobs. The default Docker image used has a few libraries that are common in the machine learning/deep learning toolkit. However, users can specify a custom Docker image to use.
The web application for Atlas is a React based service that displays and interacts with information provided by the REST API.
This is Flask-based RESTful API that allows for interaction with Atlas specific information about jobs and projects. This includes information logged during the running of an Atlas job, notes on a project, and the markdown description of a project.
We use a basic HTTP server to host the files and directories that are archived during an Atlas job. The information and path of each file is stored in the Tracker to be served when needed.
The tracker is the database where any saved information is stored. This includes information logged during the running of an Atlas job, notes on a project, and the markdown description of a project.
This is a very simplistic service that we route all API calls through. If the proxy is set to “null” (via the “-n” flag), all calls will pass through without any verification, otherwise the proxy will check for the validity of the supplied token before rerouting the call. The token is generated and checked against the Authentication Server.
Contribute to the authentication proxy repository here.
For our authentication system, we use the open source tool Keycloak. This gives us an off-the-shelf setup for managing and validating accounts.
This is the default Tensorflow image that provides the TensorBoard application. Most of the magic then happens in the TensorBoard REST API
This system links files saved in the archive server that can be presented within TensorBoard and the directory that TensorBoard is using as a log directory. It makes these symlinks through a Flask-based RESTful API.
Copyright 2015-2020 Square, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
© 2020 Square, Inc. ATLAS, DESSA, the Dessa Logo, and others are trademarks of Square, Inc. All third party names and trademarks are properties of their respective owners and are used for identification purposes only.