Prerequisites | Quick Start | Service Endpoints | Architecture | Project Organization | UI Showcase.
Prerequisites
- Python
- Conda or Venv
- Docker
Installation and Quick Start
1. Clone the repo
```sh
git clone https://github.com/brnaguiar/mlops-next-watch.git
```
2. Create environment
```sh
make env
```
3. Activate conda env
```sh
source activate nwenv
```
4. Install requirements / dependencies and assets
```sh
make dependencies
```
5. Pull the datasets
```sh
make datasets
```
6. Configure containers and secrets
```sh
make init
```
7. Run Docker Compose
```sh
make run
```
8. Populate production Database with users
```sh
make users
```
Useful Service Endpoints
```
- Jupyter `http://localhost:8888`
- MLFlow `http://localhost:5000`
- Minio Console `http://localhost:9001`
- Airflow `http://localhost:8080`
- Streamlit Frontend `http://localhost:8501`
- FastAPI Backend` http://localhost:8000/`
- Grafana Dashboard `http://localhost:3000`
- Prometheus `http://localhost:9090`
- Pushgateway `http://localhost:9091`
- Spark UI `http://localhost:8081`
```
Architecture
Erratum: In "Monitoring and Analytics", it should be Grafana instead of Streamlit.
Project Organization
------------
├── LICENSE
│
├── Makefile <- Makefile with commands like `make env` or `make run`
│
├── README.md <- The top-level README for developers using this project
│
├── data
│ ├── 01-external <- Data from third party sources
│ ├── 01-raw <- Data in a raw format
│ ├── 02-processed <- The pre-processed data for modeling
│ └── 03-train <- Splitted Pre-Processed data for model training
├── airflow
│ ├── dags <- Airflow Dags
│ ├── logs <- Airflow logging
│ ├── plugins <- Airflow default directory for Plugins like Custom Operators, Sensors, etc... (however, we use the dir `include` in dags for this purpose)
│ └── config <- Airflow Configurations and Settings
│
├── assets <- Project assets like jar files used in Spark Sessions
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks used in experimentation
│
├── docker <- Docker data and configurations
│
├── images <- Project images
│
├── requirements.local <- Required Site-Packages
│
├── requirements.minimal <- Required Dist-Packages
│
├── Makefile <- File containing rules and dependencies to automate building processes
│
├── setup.py <- Makes project pip installable (pip install -e .) so src can be imported
│
├── src <- Source code for use in this project.
│ │
│ ├── collaborative <- Source code for the collaborative recommendation strategy
│ │ └── models <- Collaborative models
│ │ └── nodes <- Data processing, validation, training, etc. functions (or nodes) that represent units of work.
│ │ └── pipelines <- Collection of orquestrated data processing, validation, training, etc. nodes, arranged in a sequence or a directed acyclic graph (DAG)
│ │
│ ├── conf <- Configuration files and parameters for the projects
│ │
│ ├── main.py <- Main script, mostly to run pipelines
│ │
│ ├── scripts <- Scripts, for instance, to create credentials files and populate databases
│ │
│ └── frontend <- Source code for the Application Interface
│ │
│ └── utils <- Project utils like Handlers and Controllers
│
└── tox.ini <- Settings for flake8
│
└── pyproject.toml <- Settings for the project, and tools like isort, black, pytest, etc.
UI Showcase
### Streamlit Frontend App
### MLflow UI
### Minio UI
### Airflow UI
### Grafana UI
### Prometheus UI
### Prometheus Drift Detection Example
--------
Project based on the cookiecutter data science project template.