Story: Improve model by improving data collection.

To improve Model, making data collection faster is imperative. The bottleneck in training data generation pipeline is the rasterization of files. The current approach follows the following steps:

raster full disk image using QGIS (15 - 20 mins)
Load the shapefiles from HMS database onto QGIS
locate where the plumes are coming from
Draw a shapefile or Modify the existing Shapefile
export the shapefile.

In this method, using qgis software is not optimal as it cannot process a fulldisk raw file in-memory. So, every scroll through the data in the software may potentially make the software reload the data from the disk. this may slow down the labelling process.

To overcome this:

We can use the info from NOAA shapefiles to narrow down approximate extent of a smoke plume.
We use those co-ordinates to crop and warp (using gdal calls) a smaller section of raw file to create a GeoTiff Raster and store it on disk. (BAND 1 and BAND 3 will be used for this)
Re draw Shapefiles and build dataset using the generated GeoTiff in QGIS. this should be much faster than loading a raw fulldisk.
Re-train the model

NASA-IMPACT / pixel-detector

Story: Improve model by improving data collection. #6