Background

Accurate and timely (e.g. within season) crop classification using remotely sensed data presents many unique challenges. Unlike static objects such as roads and buildings, crop phenology is dynamic within a growing season, and unique crop classes are generally not distinguishable by sight, even with high-resolution satellite imagery. The complexity of the problem increases in developing countries in which plot sizes are small (limiting the effectiveness of low- and medium-resolution satellite imagery), species-level crop diversity is high, and ground-truth training data for classification algorithms are limited.

Accurate, in-season spatial extent and yield estimates could prove beneficial to a variety of audiences and will be critical in the development of insurance and financial services products for small-scale agriculturalists in developing countries.

This project explores possible approaches to addressing certain problems inherent to crop classification in developing countries. Specifically, a time-series of pixels from Sentinel-2 tiles will first be clustered using an unsupervised learning algorithm. Within cluster time-series curves (e.g. NDVI), along with their spatial distribution, will then be examined and matched with individual crop and other landcover classes. The crop and other land cover classes labeled in the unsupervised step will then be used as training data in a supervised classification model. Tanzania has been identified as a target country due to it's relatively simple crop mix (dominated by maize production), low maize productivity and potential synergies with ongoing projects.

Team

(TO BE FINALIZED)

ZOD (Z) - @abarciauskas-bgse @matthewhanson
Owner (O) and Doer (D) - Jamey

Project phases

Area of Interest (AOI) identification and data wrangling
Data pre-processing and clustering
Supervised classification

Project phases in detail

1. AOI identification and data wrangling:

Phase 1 tasks:

Identify region that meets the following criteria:
- Simple cropping pattern (e.g., single maize harvest/year)
- Low crop species diversity
- Major maize producing region (provided maize remains the primary crop of interest)
  - Preliminary candidate: Rukwa Region, Tanzania (low crop diversity, major maize producer, single maize harvest) Effective number of crop species (ENS) in Tanzania, 2014/15 growing season. Data
  - Cropped area in Rukwa is dominated by Maize for the maize growing season, which may allow for easier cluster labeling:

pct_crop_barplot

Develop model validation methodology
- A lack of ground-truth validation data presents a challenge for model validation. We may ultimately need to rely on comparing crop cover estimates against aggregated (at the district- or region-level) data, e.g. from a published agricultural census
Using a relatively small spatial extent (sub-Sentinel-2 tile), generate time-series of raster layers to be used in clustering process. Possible candidates include NDVI or other indices, principle components, and combinations thereof. The goal should be to generate input data that results in distinct clusters corresponding to crop and land cover classes.
- Certain steps in the clustering procedure described below may be computationally intensive. It may therefore be beneficial to first generate polygons and/or other vector data in a visual editor (QGIS) corresponding to broad land cover categories that can be identified visually such as cropped area, forest, built up, etc. Vector data can be used to subset the raster allowing for 1) balanced samples across broad land cover classes and 2) varying of sample sizes in the model development phase, thereby preventing computational bottlenecks.

Phase 1 deliverables:

AOI identified, and an associated time-series of Sentinel-2 tiles selected. Vector data for generic land-cover sampling generated. Various time-series to be used in the development of the clustering algorithm generated. A write-up will be provided describing successes and obstacles, and goals and procedures for subsequent steps will be updated, if necessary.

2. Data pre-processing and clustering

Phase 2 tasks:

Cluster pre-processing: Dynamic Time Warping (DTW) is a promising pre-processing option for time-series clustering. The output is a distance matrix (single measure of distance from each time series to all others) which can be used as an input to a clustering algorithm (e.g. hierarchical clustering)
- DTW may allow for the identification of time series groups that belong in the same category despite differences in the shape of their curves. This may help account for variation in planting and harvesting dates among producers
Clustering: Using the distance matrix generated using DTW, perform clustering on the time-series samples.
Label the clusters that can be identified by examining various curves (e.g. NDVI) associated with each cluster, and / or their spatial distribution (do certain clusters clearly correspond to specific land cover classes upon visual examination?).

Phase 2 deliverables:

Labeled training data corresponding to one or more crop classes as well as other land cover classes, if they can be identified. A write-up will be provided describing successes and obstacles, and goals and procedures for subsequent steps will be updated, if necessary.

3. Supervised classification

Phase 3 tasks:

Using the labeled data obtained during Step 2, develop a supervised classification model to predict crop classes across a larger spatial extent than the one used in Steps 1 and 2, e.g. multiple Sentinel-2 tiles in the Southern Highlands zone of Tanzania.
- Sequential neural network models, e.g. an LSTM recurrent neural network is a potentially exciting avenue to pursue here. Reflectance time series curves are inherently sequential and may therefore result in higher classification accuracy.
- Time permitting, classification will be performed with an LSTM network using partial time series' (e.g. the first 2-3 months in a season, for example), to simulate using a trained model pipeline for in-season crop classification.
Validate classification using aggregated crop survey data and / or ground-truth test data should it become available.

Phase 3 deliverables:

Trained classification model with a focus on predicting the spatial extent of maize in the Southern Highlands of Tanzania.

developmentseed / satTS

Project outline: Temporal crop classification #1