Open JmeCS opened 6 years ago
cc @developmentseed/all
Jamey will be presenting the project outline Friday, June 22 at 10am ET for team feedback and questions. Let me know if you are interested in joining and I will add you to the invite - perhaps just add a 👍 to this comment.
Background
Accurate and timely (e.g. within season) crop classification using remotely sensed data presents many unique challenges. Unlike static objects such as roads and buildings, crop phenology is dynamic within a growing season, and unique crop classes are generally not distinguishable by sight, even with high-resolution satellite imagery. The complexity of the problem increases in developing countries in which plot sizes are small (limiting the effectiveness of low- and medium-resolution satellite imagery), species-level crop diversity is high, and ground-truth training data for classification algorithms are limited.
Accurate, in-season spatial extent and yield estimates could prove beneficial to a variety of audiences and will be critical in the development of insurance and financial services products for small-scale agriculturalists in developing countries.
This project explores possible approaches to addressing certain problems inherent to crop classification in developing countries. Specifically, a time-series of pixels from Sentinel-2 tiles will first be clustered using an unsupervised learning algorithm. Within cluster time-series curves (e.g. NDVI), along with their spatial distribution, will then be examined and matched with individual crop and other landcover classes. The crop and other land cover classes labeled in the unsupervised step will then be used as training data in a supervised classification model. Tanzania has been identified as a target country due to it's relatively simple crop mix (dominated by maize production), low maize productivity and potential synergies with ongoing projects.
Team
(TO BE FINALIZED)
Project phases
Project phases in detail
1. AOI identification and data wrangling:
Phase 1 tasks:
Identify region that meets the following criteria:
Major maize producing region (provided maize remains the primary crop of interest)
Preliminary candidate: Rukwa Region, Tanzania (low crop diversity, major maize producer, single maize harvest)
Effective number of crop species (ENS) in Tanzania, 2014/15 growing season. Data
Cropped area in Rukwa is dominated by Maize for the maize growing season, which may allow for easier cluster labeling:
Develop model validation methodology
Using a relatively small spatial extent (sub-Sentinel-2 tile), generate time-series of raster layers to be used in clustering process. Possible candidates include NDVI or other indices, principle components, and combinations thereof. The goal should be to generate input data that results in distinct clusters corresponding to crop and land cover classes.
Phase 1 deliverables:
AOI identified, and an associated time-series of Sentinel-2 tiles selected. Vector data for generic land-cover sampling generated. Various time-series to be used in the development of the clustering algorithm generated. A write-up will be provided describing successes and obstacles, and goals and procedures for subsequent steps will be updated, if necessary.
2. Data pre-processing and clustering
Phase 2 tasks:
Cluster pre-processing: Dynamic Time Warping (DTW) is a promising pre-processing option for time-series clustering. The output is a distance matrix (single measure of distance from each time series to all others) which can be used as an input to a clustering algorithm (e.g. hierarchical clustering)
Clustering: Using the distance matrix generated using DTW, perform clustering on the time-series samples.
Label the clusters that can be identified by examining various curves (e.g. NDVI) associated with each cluster, and / or their spatial distribution (do certain clusters clearly correspond to specific land cover classes upon visual examination?).
Phase 2 deliverables:
Labeled training data corresponding to one or more crop classes as well as other land cover classes, if they can be identified. A write-up will be provided describing successes and obstacles, and goals and procedures for subsequent steps will be updated, if necessary.
3. Supervised classification
Phase 3 tasks:
Using the labeled data obtained during Step 2, develop a supervised classification model to predict crop classes across a larger spatial extent than the one used in Steps 1 and 2, e.g. multiple Sentinel-2 tiles in the Southern Highlands zone of Tanzania.
Validate classification using aggregated crop survey data and / or ground-truth test data should it become available.
Phase 3 deliverables:
Trained classification model with a focus on predicting the spatial extent of maize in the Southern Highlands of Tanzania.