Quick Update - Githubissues

FabriceSamonte commented 3 years ago

Hi all,

Just thought I'd post an update on my progress. I've managed to extract Google EE's landsat (5, 7 and 8), MODIS 16-day nadir BRDF-Adjusted Reflectance (MCD43A4), and MODIS 8-day surface reflectance (MOD09A1) using the given locations of each field observations from star transects (data : https://field.jrsrp.com/). A version of the csv can be found under data/star_transects_google_ee.csv. The new columns in are as follows:

l5_b1_sf_mean - landsat 5 band 1 surface reflectance average for the surrounding 3x3 grid. I went with a brute force approach so there are columns per landsat image (5, 7 and 8) per band.
MOD09A1_sur_refl_b01 - MOD09A1 band 1 surface reflectance band value as extracted from the 500m resolution pixel - there are respective columns from bands 1-7
Nadir_Reflectance_Band1 and BRDF_Albedo_Band_Mandatory_Quality_Band1 - data extracted from MCD43A4 (500m resolution), in which there are also respective columns for bands 1-7.

METHOD

There is a lot of repeated code in earth_engine/extract.js, so I'll highlight the important bits here.

Extracting MCD43A4 and MOD09A1

For each location and its observation date, I extract the pixel value at 500m resolution from MCD43A4 and MOD09A1.

For MOD09A1, I filtered the image collection within a 16-day (wasn't too sure as to what window to use) period around the observation time, and applied a mask for cloudy pixels, cloud shadows, cirrus clouds and pixels that have low or no aerosol loading:

function maskMOD09A1Clouds(image) {
  var qa = image.select('StateQA');
  var cloudState = bitwiseExtract(qa, 0, 1); 
  var cloudShadowState = bitwiseExtract(qa, 2);
  var cirrusState = bitwiseExtract(qa, 8, 9);
  var aerosolQuantity = bitwiseExtract(qa, 6, 7); 
  var mask = cloudState.eq(0) // Clear
    .and(cloudShadowState.eq(0)) // No cloud shadow
    .and(cirrusState.eq(0)) // No cirrus
    .and(aerosolQuantity.lte(1)); // Low or no aerosol quantity
  var maskedImage = image.updateMask(mask);

  return maskedImage; 
}

// filter and cloud-mask image collection 
  var MOD09A1img = MOD09A1Collection.
  filterDate(dateRange.get(0), 
            dateRange.get(1)).
  filterBounds(geom).
  map(maskMOD09A1Clouds).
  median();

Given the new image collection, I created a median summary of all images (median value per band for each pixel, from multiple images), and extract the raw value of each band within the resultant image. This is point where the methodology differs from the paper https://doi.org/10.1016/j.rse.2015.01.021, where they select a single raw image from a collection of images within a 8-day window, based on high observation coverage, cloud free, aerosol loading etc. as opposed to using a median summary. I wasn't too sure as to how to go about comparing these values between MODIS images within the collection.

I extracted MCD43A4 using the same method as above, except without any masking.

Extracting Landsat
For Landsat 5, 7, and 8, I filtered the image collection based on the location of the observation and a 60-day window from the time of the observation. I used Google EE's built cloud mask function to mask out any cloudy pixels. Similar to extracting MOD09A1, I created a median summary of the filtered image collection to extract values from.

// get obs_time from feature 
  var date=ee.Date(feature.get('obs_time'));

  // create 60 day date range from given observation time
  var dateRange=ee.List([
    date.advance(-30, 'day'),
    date.advance(30, 'day')
  ]);

  // filter landsat 5 T1 SR by date range + star transect location 
  var img=landsat5.
  filterDate(dateRange.get(0), 
            dateRange.get(1)).
  filterBounds(geom); 

  // cloud mask then take median for each band 
  var composite=img.
    map(cloudMaskL457).
    median();

For each observation, I also created a 3x3 grid 30m grid to extract values from, like so,

// 3x3 Grid around geom 
  var grid=ee.FeatureCollection([
    feature, 
    ee.Feature(getLonLatOffset(geom, 30, 30), {name : 'NE'}),
    ee.Feature(getLonLatOffset(geom, 30, 0), {name : 'N'}), 
    ee.Feature(getLonLatOffset(geom, 0, 30), {name : 'E'}), 
    ee.Feature(getLonLatOffset(geom, 30, -30), {name : 'NW'}), 
    ee.Feature(getLonLatOffset(geom, -30, 30), {name : 'SE'}), 
    ee.Feature(getLonLatOffset(geom, 0, -30), {name : 'W'}), 
    ee.Feature(getLonLatOffset(geom, -30, 0), {name : 'S'}), 
    ee.Feature(getLonLatOffset(geom, -30, -30), {name : 'SW'}), 
  ]);

where getLonLatOffset function returns longitude and latitude 30 meters vertically or horizontally offsetted from a given location. By extracting landsat values for each cell from the 3x3 grid, I was able to aggregate the effective surface reflectance mean of each band. I repeated this for collections landsat 5, 7 and 8.

Final dataset was created like so:

var landsatExtracted=ee.FeatureCollection(starTransects).
  map(extractMODIS, true). 
  map(extractAvgLandsat5, true).
  map(extractAvgLandsat7, true).
  map(extractAvgLandsat8, true);

Export.table.toDrive({
  collection: landsatExtracted,
  description: 'updated_star_transects',
  fileFormat: 'CSV'
});

A couple questions:

@mitchest , I'm currently missing the standard deviation data of the landsat surface reflectance values within a given 3x3 grid. Does Google EE have built-in functionality for this, or would I have to manually calculate those values (ie. create a function for standard deviation with a list of numbers as an input)?
What are the most appropiate date ranges to select for each dataset?
Is extracting the date of the image used important? If so, I'm assuming I would have to extract data from a single raw image based on cloudiness etc. instead of using a median summary.

wcornwell commented 3 years ago

@adrian-g-fisher maybe comment here, so we don't lose your comment.

mitchest commented 3 years ago

@mitchest , I'm currently missing the standard deviation data of the landsat surface reflectance values within a given 3x3 grid. Does Google EE have built-in functionality for this, or would I have to manually calculate those values (ie. create a function for standard deviation with a list of numbers as an input)?

As usual, there's a number of ways to do it...

To fit into your grid method above, you could use the generic .reduce() method with ee.Reducer.stdDev() as the reducer. It can be applied to a number of types, including feature collections, lists, images
You could .set() the values to the feature collection then aggregate over it (e.g. .aggregate_mean() and .aggregate_total_sd()
If I was doing it from scratch, there is a built in function called .reduceNeighborhood() that is extremely efficient, in which you specify the reducer (e.g. ee.Reducer.stdDev()) and a kernel (e.g. ee.Kernel.square()). That method applies to each pixel in an image, but thanks to GEE's lazy eval, it will only calculate the neighbourhoods being requested by the feature collection or geom you use to extract values.

What are the most appropiate date ranges to select for each dataset?

Probably a good question for @adrian-g-fisher, since he'll have dealt with this before. A month each side is reasonable, but we might need to go a little more to get enough good cloud free images at certain times of year.

Is extracting the date of the image used important? If so, I'm assuming I would have to extract data from a single raw image based on cloudiness etc. instead of using a median summary.

If we go with the "geomedian" approach, the individual dates are not so important - would be nice to get the actual number of image pixels the median values were derived from, but not exactly sure what we'd do with it yet (perhaps to check for bias?). Via the reduce() on the image collection, you'd just use ee.Reducer.count() as well as eeReducer.mean()

FabriceSamonte commented 3 years ago

Thanks @mitchest, awesome info! I definitely missed aggregate_total_sd(), but it would be super useful with what I currently have now, since I'm already applying aggregate_mean() for the gridded FeatureCollection. I'll also have a look into reduceNeighborhood(), since my compute times are quite lengthy.

adrian-g-fisher commented 3 years ago

A couple of quick comments. I'm away with the family this week, so typing badly on my phone.

I wouldn't worry about the MODIS imagery.

Peter extracted values from the nearest cloud free Landsat image, rather than using a composite. The date of the image is in the image name of the cab file in yyyymmdd. I think he may have used the time difference between field and satellite as a weighting.

Pretty sure Peter would have made zeros 0.01 and ones 0.99, although it is possible for the field measurements to be 100% one cover type.

FabriceSamonte commented 3 years ago

@adrian-g-fisher the data stored in fractional_calval_filename correct? I'll do some work reformatting the date and extracting from the exact image they used.

Also, which Sentinel-2 dataset were you after in particular? EE provides:

Sentinel-2 MSI: MultiSpectral Instrument, Level-1C
Sentinel-2 MSI: MultiSpectral Instrument, Level-2A
Sentinel-2: Cloud Probability

adrian-g-fisher commented 3 years ago

Yes, the date in the calval filename.

Sentinel2 MSI level 2a should be a surface reflectance product similar to the Landsat.

FabriceSamonte / ee_fractional_cover

Quick Update #1