ccmmf / organization

Repository for CCMMF discussions and administration
https://github.com/ccmmf
0 stars 0 forks source link

Define structures for event data #49

Open dlebauer opened 4 days ago

dlebauer commented 4 days ago
dlebauer commented 4 days ago

Example of minimal JSON events

Information what can be inferred from satellite data.

{
    "management": {
        "events": [
            {
                "mgmt_operations_event": "tillage",
                "date": "2022-03-01"
            },
            {
                "mgmt_operations_event": "tillage",
                "date": "2022-04-05"
            },
            {
                "mgmt_operations_event": "planting",
                "date": "2022-04-15"
            },
            {
                "mgmt_operations_event": "harvest",
                "date": "2022-10-25",
                "harvest_residue_placement": "left in field"
            },
            {
                "mgmt_operations_event": "planting",
                "date": "2022-11-15",
            },
            {
                "mgmt_operations_event": "mowing",
                "date": "2023-03-21"
            }
        ]
    }
}

This example is not compliant with the management-events.schema.json because it is missing required fields.

Example of events JSON with additional information gap-filled

This is information that will be required by SIPNET, and probably(?) also by the statistical downscaling algorithm. Thus most of this should be in the raster data layers.

{
    "management": {
        "events": [
            {
                "mgmt_operations_event": "tillage",
                "date": "2022-03-01",
                "tillage_implement": "subsoiler",
                "tillage_operations_depth": "20"
            },
            {
                "mgmt_operations_event": "tillage",
                "date": "2022-04-05",
                "tillage_implement": "disk, tandem",
                "tillage_operations_depth": "10"
            },
            {
                "mgmt_operations_event": "planting",
                "date": "2022-04-15",
                "planting_list": [
                    {
                        "planted_crop": "wheat",
                    }
                ]
            },
            {
                "mgmt_operations_event": "fertilizer",
                "date": "2022-03-20",
                "fertilizer_total_amount": 200,
                "fertilizer_material": "urea"
                "C_in_applied_fertilizer": 40,
                "N_in_applied_fertilizer": 93
            },
            {
                "mgmt_operations_event": "fertilizer",
                "date": "2022-07-15",
                "fertilizer_type": "organic material",
                "fertilizer_total_amount": 200,
                "organic_material": "compost"
                "C_in_applied_fertilizer": 40
                "N_in_applied_fertilizer": 5
            },
            {
                "mgmt_operations_event": "harvest",
                "date": "2022-10-25",
                "harvest_list": [
                    {
                        "harvest_crop": "wheat",
                        "harvest_operat_component": "grain"
                    }
                ],
                "harvest_residue_placement": "left in field",
                "harvest_yield_harvest_dw_total": 3000
            },
            {
                "mgmt_operations_event": "planting",
                "date": "2022-11-15",
                "planting_list": [
                    {
                        "planted_crop": "alfalfa",
                    }
                ]
            },
            {
                "mgmt_operations_event": "mowing",
                "date": "2023-03-21",
                "mowing_percent_cut":"80"
                "mowing_residue_placement": "left in field",
            }
        ]
    }
}

For mowing this creates a few variables

Notes:

dlebauer commented 3 days ago

GeoTIFF format:

After writing out netcdf format, @mdietze suggested geotiff. The information that the file will be basically the same.

GeoTIFF should be simpler to deal with, not least of which is using YYYYMMDD instead of 'days since 1700-01-01'.

One file per event type per event_type per date, name something like _YYYYMMDD.tif.

Byte data type [values 0-255] should be sufficient.

Description of netCDF format

Much of the below also applies to GeoTIFF:

For satellite derived data:

For inputs to statistical downscaling models:

First draft of a netcdf file tillage.nc (CDL specification)

netcdf tillage {
dimensions:
dimensions:
    time = 240 ;
    latitude = 156 ;
    longitude = 108 ;
variables:
    double time(time) ;
        time:units = "days since 1970-01-01 00:00:00" ;
        time:calendar = "gregorian" ;
        time:standard_name = "time" ;
        time:long_name = "Time of Tillage Event" ;
      //Optional: start with year as integer instead of date?
    double latitude(latitude) ;
        latitude:units = "degrees_north" ;
        latitude:standard_name = "latitude" ;
    double longitude(longitude) ;
        longitude:units = "degrees_east" ;
        longitude:standard_name = "longitude" ;
    byte tillage_present(time, latitude, longitude) ;
        tillage_present:long_name = "Tillage Event Presence" ;
        tillage_present:flag_values = 0b, 1b ;
        tillage_present:flag_meanings = "no_event event_present" ;
        tillage_present:_FillValue = -127b ;

    // Optional : Spatial reference
    int crs ;
        crs:grid_mapping_name = "latitude_longitude" ;
        crs:longitude_of_prime_meridian = 0.0 ;
        crs:semi_major_axis = 6378137.0 ;
        crs:inverse_flattening = 298.257223563 ;
        crs:spatial_ref = "EPSG:4326" ;

 // global attributes:
    :title = "Satellite Derived Tillage Events over California Croplands" ;
    :institution = "California Cropland Measurement and Modeling Framework";
    :source = "Derived from satellite data and agronomic records" ;
    :history = "Created 2024-10-17 by CCMMF" ;
    :references = "Data derived from harmonized Landsat-Sentinel data" ;
    :Conventions = "CF-1.8" ;

 // data:
data:

 latitude = [array of latitude values covering California] ;
 longitude = [array of longitude values covering California] ;
 time = [19013, 19048, ... ] ; // Times when tillage events occur

 event_present = 
0, 0, 0, 0, 1, 1, 0, 0, 0, ...
0, 0, 0, 0, 1, 1, 0, 0, 0, ...
  // For each time step, a 2D array over latitude and longitude
}
dlebauer commented 3 days ago

Example conversion from netcdf --> JSON

Update: prob. going with GeoTIFF

This is rough, but the goal of writing this out is to evaluate how hard the conversion will be.

Design objective is keeping the "gridded layer --> JSON --> SIPNET" use case simple.

# for event_type in c("tillage", ...){

nc <- nc_open(paste0(event_type, ".nc"))

date <- ncvar_get(nc, "time") |> 
   mutate(date = f(time) # convert from days since to YYYYMMDD
event <- ncvar_get(nc, "tillage_present")

latitude <- ncvar_get(nc, "latitude")
longitude <- ncvar_get(nc, "longitude")

event_date <- date[event] # now we are loosing not event present
previous_event_not_present_date <- date[f(event)] # some logic to find closest previous 0
# Close the NetCDF file
nc_close(nc)

data_list = list()
for (date_i in event_date){
  append(data_list, 
    list(
      date = date_i,
      mgmt_operations_event = event_type
      earliest_event_date = date - previous_event_not_present_date
    )
  )
)
data_list <- list("management" = list("events" = data_list))

# Convert the list to JSON
json_data <- toJSON(data_list, pretty = TRUE)
writeLines(json_data, file = paste0(round(latitude, 3), round(longitude, 3), event_type, ".json"))