Computer-Vision-Team-Amsterdam / Blurring-as-a-Service

Removing personal data from Imagery
Other
6 stars 1 forks source link

Blurring-as-a-Service

This project is about removing personal data, i.e. persons and license plates from raw panorama images. We aim to do this in an inclusive manner, and we select our images based on different types of biases which can occur. We create a document with a list of potential biases that we want to minimize and we select the panorama images used for training the model based on this document.

Useful links

  1. Panorama API
  2. Panorama viewer
  3. Excel sheet with inclusivity biases (risico's en maatregelen.xlsx)
  4. Annotation project in Azure ML
  5. Miro board with pipelines architecture

Installation

1. Clone the code

git clone git@github.com:Computer-Vision-Team-Amsterdam/Blurring-as-a-Service.git

2. Install Poetry

If you don't have it yet, follow the instructions here to install the package manager Poetry.

3. Install libpq-dev

To be able to install psycopg2 to interact with the database libpq-dev is needed:

sudo apt-get install libpq-dev

4. Init submodules

You need to initialize the content of the submodules so git clones the latest version.

git submodule update --init --recursive

5. Setup the AzureML connection

To allow your code to connect to Azure ML and train the model is necessary to retrieve a connection config. This can be done clicking on the change workspace button located on the top right in the AzureML website, and then in "Download config file". The downloaded "config.json" file must be added in the top folder of the project.

6. Install dependencies

In the terminal, navigate to the project root (the folder containing pyproject.toml), then use Poetry to create a new virtual environment and install the dependencies.

poetry install

7. Install pre-commit hooks

The pre-commit hooks help to ensure that all committed code is valid and consistently formatted.

poetry run pre-commit install

8. Setup running configuration

Copy the config.example.yml file and rename it to config.yml. Adapt the config file to your execution configuration, setting azure paths of where the data can be located and the flags to enable or disable pipeline steps.


Pipelines

More information about the pipelines can be found on our Azure DevOps Wiki.

Database

To access a database in Azure Machine Learning it is necessary to create a database.json file inside the database folder. An example of the structure can be found in the folder under the name database.example.json.

This database.json file should include the following information:

    client_id:      client id of the managed identity in Azure

Monitoring

We monitor the health of the pipelines in the BaaS workbook which can be found in portal. The dashboard folder contains the workbook in gallery template (.workbook) and ARM template (.json).