Fields of The World (FTW) is a comprehensive benchmark dataset designed to enhance the development of machine learning models for instance segmentation of agricultural field boundaries. This dataset aims to meet the growing need for accurate and scalable field boundary data for global agricultural monitoring and assessments.
Near-Global Coverage: FTW spans four continents—Europe, Africa, Asia, and South America—covering diverse agricultural landscapes across 24 countries. This extensive geographic coverage allows for the development of models that can generalize well to different agricultural practices and field types.
Large-Scale Dataset: With approximately 1.6 million parcel boundaries and over 70,000 samples, FTW is significantly larger than previously available datasets. Each sample includes instance and semantic segmentation masks paired with multi-date, multi-spectral Sentinel-2 satellite images, enabling detailed temporal and spectral analysis.
Multi-class Segmentation: The dataset provides masks for both instance segmentation and semantic segmentation with different classes, including:
Spectral Richness: The dataset includes RGB (Red, Green, Blue) and NIR (Near-Infrared) spectral bands from Sentinel-2 images.
Temporal Richness: The dataset includes multi-date imagery to capture different stages of the growing season. Two images with distinct contrast differences were selected to represent these stages. To determine the date ranges for these images, the USDA Crop Calendar was initially referenced and then refined by selecting periods with minimal cloud cover and optimal contrast between the two images.
Comprehensive Data Splits: The dataset is carefully divided into training, validation, and test sets to ensure accurate evaluation of model performance. For each country, larger tiles are divided into smaller chips measuring 1536x1536 m². To prevent data leakage due to spatial autocorrelation, a blocked random splitting strategy is used. Chips are grouped into 3×3 blocks, with 80% allocated to training, 10% to validation, and 10% to testing.
Metadata and Documentation: The metadata and documentation provide crucial information to help users effectively interpret and utilize the dataset. It includes key details about the country of focus, temporal data collection windows, grid structures, and the year of collection.
Fields of The World
├── README.md -> This File
├── austria -> Country Folder
│ ├── label_masks -> Labels Folder
│ │ ├── instance -> Instance Segmented Masks (Label) (Masks in .tif Format)
│ │ ├── semantic_2class -> Semantic Segmented Masks (Label) (Masks in .tif Format) -> Contains 2 Classes (0-Background, 1-Polygon)
│ │ └── semantic_3class -> Semantic Segmented Masks (Label) (Masks in .tif Format) -> Contains 3 Classes (0-Background, 1-Polygon, 2-Boundaries)
│ ├── s2_images -> Images Folder (Contains image chips)
│ │ ├── window_a -> Window A images (Images in .tif Format)
│ │ └── window_b -> Window B images (Images in .tif Format)
│ ├── chips_austria.parquet -> Chips file in geoparquet format, contains split details (Each chips belongs in one of Train/Val/Test split)
│ └── data_config_austria.json -> Contains meta data about the bigger grids for the dataset, crop types, dates for temporal windows.
├── austria.zip -> Country Zip Folder, this contains all the files in the country directory.
└── checksum.md5 -> Checksum MD5 file containing all the individual files checksum hashes.
..... Continues for all the countries in the same format.
Country | Year of Validity | Parcel Counts | Chips | Train Split | Validation Split | Test Split | Source Polygons | Source Data License |
---|---|---|---|---|---|---|---|---|
Austria | 2021 | 196101 | 6686 | 5304 | 637 | 745 | Link | CC-BY-4.0 |
Belgium | 2021 | 63431 | 1941 | 1554 | 189 | 198 | Link | No restrictions on public access |
Brazil | 2020 | 1854 | 1607 | 1289 | 130 | 188 | Link | CC-BY-4.0 |
Cambodia | 2021 | 318088 | 344 | 274 | 36 | 34 | Link | CC-BY-4.0 |
Corsica | 2021 | 5360 | 2472 | 1974 | 240 | 258 | Link | CC-BY-2.0 |
Croatia | 2023 | 157481 | 3482 | 2778 | 351 | 353 | Link | Open Data |
Denmark | 2021 | 37677 | 3560 | 2868 | 360 | 332 | Link | CC0-1.0 |
Estonia | 2021 | 26695 | 6713 | 5348 | 681 | 684 | Link | CC-3.0 |
Finland | 2021 | 57323 | 5665 | 4527 | 550 | 588 | Link | CC-BY-4.0 |
France | 2020 | 55342 | 3744 | 2988 | 360 | 396 | Link | Open Licence |
Germany | 2018/2019 | 4598 | 686 | 306 | 30 | 350 | Link | DL-DE/BY-2-0 |
India | 2016 | 10013 | 2002* | 1281 | 300 | 399 | Link | CC-BY-4.0 |
Kenya | 2022 | 874 | 391 | 316 | 20 | 55 | Link | GPL-2.0-or-later |
Latvia | 2021 | 44964 | 6938 | 5529 | 668 | 741 | Link | CC-BY-NC-4.0 |
Lithuania | 2021 | 61424 | 5258 | 4208 | 522 | 528 | Link | Non-commercial use only |
Luxembourg | 2022 | 29018 | 808 | 643 | 81 | 84 | Link | CC0-1.0 |
Netherlands | 2022 | 43169 | 3879 | 3110 | 381 | 388 | Link | CC0-1.0 |
Portugal | 2021 | 5040 | 86 | 64 | 12 | 10 | Link | CC-BY-NC-4.0 |
Rwanda | 2021 | 1532 | 70 | 57 | 6 | 7 | Link | CC-BY-4.0 |
Slovakia | 2021 | 14242 | 4073 | 3275 | 390 | 408 | Link | CC0-1.0 |
Slovenia | 2021 | 67488 | 2177 | 1733 | 216 | 228 | Link | CC-BY-4.0 |
South Africa | 2018 | 6568 | 747 | 590 | 72 | 85 | Link | CC-BY-NC-SA-4.0 |
Spain | 2020 | 258465 | 2440 | 2019 | 202 | 219 | Link | CC-BY-4.0 |
Sweden | 2021 | 39718 | 4760 | 3802 | 442 | 516 | Link | No restrictions on public access |
Vietnam | 2021 | 120913 | 288 | 229 | 36 | 23 | Link | CC-BY-4.0 |
*India has a total of 2,002 chips available. Of these, 22 chips are marked as 'none'
for the split column as per the original data curator. Thus, 1,980 chips have been used in the train/validation/test splits in India.