developmentseed / pearl-ml-pipeline

12 stars 3 forks source link

PEARL ML Training Pipeline

This repo contains scripts to manage training data, workflow to create Azure ML stack and train new models that are compatible to be run on the PEARL Platform. It is based on the work on Caleb Robinson of Microsoft.

Training

Evaluation

SEED Data

How/Why we create Seed Data

Training Dataset Creation

There are two options to create the training dataset.

Option 1. Feed LULC labels data in GeoTiff format.

naip-label-align.py and NAIPTileIndex.py provided functions on how to:

Notes:

These CSVs can be deployed to AML for model training direction. Instruction will be given in the following section.

python naip-label-align.py 
    --label_tif_path sample.tif 
    --out_dir <dir-name>/ 
    --threshold [0.0 to 1.0] 
    --aoi <aoi-name> 
    --group <group-name>

Option 2. LULC labels available as GeoJSON (vector) files, and rasterization is required.

Model Training on Azure ML(AML)

If you are going to use AML to train LULC models for the first time, please go through these steps.

Screen Shot 2021-11-08 at 8 20 04 AM

Configure environment

This code was tested using python 3.6.5

Create a conda environment using .pytorch-env.yaml file and execute the scripts from the created environment.

You will need to set the following variables in your .env

bash

AZ_TENANT_ID=XXX #az account show --output table
AZ_SUB_ID=XXX #az account list --output table

AZ_WORKSPACE_NAME=XXX #User set
AZ_RESOURCE_GROUP=XXX #User set
AZ_REGION=XXX #User set

AZ_GPU_CLUSTER_NAME=XXX #User set
AZ_CPU_CLUSTER_NAME=XXX #User set

Then export all variables to your environment:

export $(cat .env);

Create Your Workspace on AML

train_azure/create_workspace.py after export your Azure credentials, this script will create AML workspace.

Create GPU Compute

This script will create GPU compute resources to your workspace on AML.

(Optional) Create CPU Compute

This script will create GPU compute resources to your workspace on AML.

Train LULC Model on AML

We have three PyTorch based Semantic Segmenation models ready for LULC model trainings, FCN, UNet and DeepLabV3+.

To train a model on AML, you will need to define or parse a few crucial parameters to the script, for instance:

TODO: Will we be providing sample csv

config = ScriptRunConfig(
    source_directory="./src",
    script="train.py",
    compute_target=AZ_GPU_CLUSTER_NAME,
    arguments=[
        "--input_fn",
        "sample_data/indianapolis_train.csv",
        "--input_fn_val",
        "sample_data/indianapolis_val.csv",
        "--output_dir",
        "./outputs",
        "--save_most_recent",
        "--num_epochs",
        20,
        "--num_chips",
        200,
        "--num_classes",
        7,
        "--label_transform",
        "uvm",
        "--model",
        "deeplabv3plus",
    ],
)

These parameters are to be configure by the user. input_fn_X paths should be provided by the user, and are the outputs of the data generation step (NAIP Label Algin) described above.

python train_azure/run_model.py

Evaluate the Trained Model

To compute Global F1, and class base F1 scores (written in CSV) from a trained model over latest dataset. You can use this eval script as an example.

python train_azure/run_eval.py

Seed Data Creation for PEARL

After a best performing model is selected, seed dataseed need to be created to serve PEARL. Seed Data is the model embedding layers from the trained model that is used together with users inputs training data in PEARL retraining session.

run_seeddata_creation.py will config AML and use the main seeddata creation script to create seed data for the trained best performing model.

python train_azure/run_seeddata_creation.py

(Optional) Classes Distribution

LULC Class distribution is a graph showing the porpotion of LULC pixel numbers for a trained model on PEARL. See the bar chart bellow.

train_azure/run_cls_distrib.py will guide you how to compute the classes distribution from the training dataset for the model.

python train_azure/run_cls_distrib.py

Screen Shot 2021-11-08 at 8 07 49 AM