brnaguiar / mlops-next-watch

MLOps project that recommends movies to watch implementing Data Engineering and MLOps best practices.
Other
0 stars 1 forks source link
airflow artificial-intelligence aws-s3 batch-inference batch-scoring data-engineering dvc grafana minio mlflow mlops movie-recommendation postgresql prometheus recommender-system spark

Logo

Next Watch: E2E MLOps Pipelines with Spark!

CI

Prerequisites | Quick Start | Service Endpoints | Architecture | Project Organization | UI Showcase.

Prerequisites

- Python - Conda or Venv - Docker

Installation and Quick Start

1. Clone the repo ```sh git clone https://github.com/brnaguiar/mlops-next-watch.git ``` 2. Create environment ```sh make env ``` 3. Activate conda env ```sh source activate nwenv ``` 4. Install requirements / dependencies and assets ```sh make dependencies ``` 5. Pull the datasets ```sh make datasets ``` 6. Configure containers and secrets ```sh make init ``` 7. Run Docker Compose ```sh make run ``` 8. Populate production Database with users ```sh make users ```

Useful Service Endpoints

``` - Jupyter `http://localhost:8888` - MLFlow `http://localhost:5000` - Minio Console `http://localhost:9001` - Airflow `http://localhost:8080` - Streamlit Frontend `http://localhost:8501` - FastAPI Backend` http://localhost:8000/` - Grafana Dashboard `http://localhost:3000` - Prometheus `http://localhost:9090` - Pushgateway `http://localhost:9091` - Spark UI `http://localhost:8081` ```

Architecture

Erratum: In "Monitoring and Analytics", it should be Grafana instead of Streamlit.

Project Organization

------------ ├── LICENSE │ ├── Makefile <- Makefile with commands like `make env` or `make run` │ ├── README.md <- The top-level README for developers using this project │ ├── data │   ├── 01-external <- Data from third party sources │   ├── 01-raw <- Data in a raw format │   ├── 02-processed <- The pre-processed data for modeling │   └── 03-train <- Splitted Pre-Processed data for model training ├── airflow │   ├── dags <- Airflow Dags │   ├── logs <- Airflow logging │   ├── plugins <- Airflow default directory for Plugins like Custom Operators, Sensors, etc... (however, we use the dir `include` in dags for this purpose) │   └── config <- Airflow Configurations and Settings │ ├── assets <- Project assets like jar files used in Spark Sessions │ ├── models <- Trained and serialized models, model predictions, or model summaries │ ├── notebooks <- Jupyter notebooks used in experimentation │ ├── docker <- Docker data and configurations │ ├── images <- Project images │ ├── requirements.local <- Required Site-Packages │ ├── requirements.minimal <- Required Dist-Packages │ ├── Makefile <- File containing rules and dependencies to automate building processes │ ├── setup.py <- Makes project pip installable (pip install -e .) so src can be imported │ ├── src <- Source code for use in this project. │ │ │   ├── collaborative <- Source code for the collaborative recommendation strategy │   │   └── models <- Collaborative models │   │   └── nodes <- Data processing, validation, training, etc. functions (or nodes) that represent units of work. │   │   └── pipelines <- Collection of orquestrated data processing, validation, training, etc. nodes, arranged in a sequence or a directed acyclic graph (DAG) │ │ │   ├── conf <- Configuration files and parameters for the projects │ │ │   ├── main.py <- Main script, mostly to run pipelines │ │ │   ├── scripts <- Scripts, for instance, to create credentials files and populate databases │ │ │   └── frontend <- Source code for the Application Interface │ │ │   └── utils <- Project utils like Handlers and Controllers │ └── tox.ini <- Settings for flake8 │ └── pyproject.toml <- Settings for the project, and tools like isort, black, pytest, etc.

UI Showcase

### Streamlit Frontend App ### MLflow UI ### Minio UI ### Airflow UI ### Grafana UI ### Prometheus UI ### Prometheus Drift Detection Example --------

Project based on the cookiecutter data science project template.