NASA-IMPACT / pixel-detector

pixel detector using shapefiles for generating truth set.
4 stars 0 forks source link

Analysis report on False Positives arising while inferring on old vs. new smoke model #21

Closed ghltshubh closed 2 years ago

ghltshubh commented 2 years ago

Motivation

We had been using a model that was trained on the rasterized values obtained from raw NetCDF files for smoke detection. This caused some issues while running in the production setting. We had a lot of false positives. To handle this, we will need to retrain the model with the same source of data.

Background

Current smoke detection uses NN segmentation technique to detect smoke in rasterized images. To train a model we provide images along with their masks or labels that demarcate the smoke regions in those images using Intersection Over Union metric:

IoU = TP/(TP+FN+FP)

For each pixel in an input image the model spits out a probability for that pixel being a smoke pixel or not. We choose a threshold value, say 0.5, as a cutoff and assign all the pixels with probability higher than this cutoff as smoke.

Training data source

False positive comparison technique

Currently, we are thresholding the output of models at probability value of 0.5 which means that any values equal to 0.5 and above are marked as smoke and below 0.5 are marked as no-smoke. This threshold probability is an important parameter that can be tuned for values between 0 and 1.

Therefore, we compare the 3 band outputs from both old and new model for threshold values between 0 - 1 with an increment of 0.1 for both WMTS and non-WMTS endpoint datasets.

The bandwise outputs are available: https://drive.google.com/drive/u/2/folders/1CGAth8SlykuymBb6cqkDhMC5IXPRqLMw

Analysis

muthukumaranR commented 2 years ago

can we add the actual shapefile associated with this event in the visualizations ?

ghltshubh commented 2 years ago

Github doesn't support .tif files so changed to .png

  1. smoke_wmts_ref_label smoke_wmts_ref_label

  2. smokev4_6b_ref_extended_label smokev4_6b_ref_extended_label

ghltshubh commented 2 years ago

Smokev4_6b_ref_extended model

The current threshold value is set at 0.5 therefore any pixels with probability higher than 0.5 are labeled as smoke and vice-versa.

We compare the outputs from the old model in production and the new model for different values of the probability threshold ranging from 0.1 - 0.9 with an increment of 0.1. Even at lower probability threshold of 0.1 as seen in the animation below, the old model tends to have higher number of false positives. It also has lower sensitivity and thus unable to detect thinner plumes.

https://user-images.githubusercontent.com/16928813/152573376-6774508f-d3e3-435e-bf7e-dc2d59783e68.mov

https://user-images.githubusercontent.com/16928813/152876939-45dc040d-7b1d-4edd-928c-efa4623ed34e.mov

ghltshubh commented 2 years ago

Smoke_wmts_ref model

The new model has been trained on WMTS endpoint data and is able to detect thinner smoke plumes at lower threshold values as seen in the animation below. It also shows lower FP rate across all threshold values relative to the old model.

https://user-images.githubusercontent.com/16928813/152572622-91785aa5-05df-4157-b3bc-d12d59a35db5.mov

https://user-images.githubusercontent.com/16928813/152876907-f07a82f5-8050-458f-9283-8b74ada11563.mov

ghltshubh commented 2 years ago

t0 1_0 1 Thinner plumes are captured in lower threshold (t = 0.1)

ghltshubh commented 2 years ago

Predicting on smoke event: 06/29/2021

New model

https://user-images.githubusercontent.com/16928813/154359484-c91ad79b-234b-4a3d-927e-18ddd8c512f5.mov

============================================================================================

Old model

https://user-images.githubusercontent.com/16928813/154359492-f0a2efd0-e10d-4e9f-9e49-1750ff9987e3.mov

xhagrg commented 2 years ago

@kaulfusa , if this is reasonable, can we push this to phenomena portal? We could run some more days in the portal itself and check for any false positives...