SM24-Industrial-Software-Dev / ML-forecasting-NOx-levels

0 stars 0 forks source link

Cloud Masking results in non-image object #20

Closed APoulad closed 1 month ago

APoulad commented 2 months ago

There seems to be an issue when trying to receive data spanning from 2019 until 2021. It seems that sometimes the cloud masking results in a non-image object somehow, since when I add the following piece of code into the maskClouds(image) function: if isinstance(masked_image, ee.Image):#rest of code else: #retun the original image and print("Masking resulted in a non-image object. Returning original image") it sometimes fails and prints the message. I am not sure why this is happening, but it seems to be only 9 problematic images in the entire dataset using the areas given below.

Additionally, when setting the end date year to 2022 it fails even with the above patch, and I don't know why.

The following are the places used for retrieving the data:

geometries = ee.FeatureCollection([ ee.Feature(ee.Geometry.Point(-74.0059, 40.7127), {'region': 'New York City'}), ee.Feature(ee.Geometry.Point(-71.0567, 42.3408), {'region': 'Boston'}), ee.Feature(ee.Geometry.Point(-118.248, 34.0739), {'region': 'Los Angeles'}), ee.Feature(ee.Geometry.Point(-122.4194, 37.7749), {'region': 'San Francisco'}), ee.Feature(ee.Geometry.Point(-87.6298, 41.8781), {'region': 'Chicago'}), ee.Feature(ee.Geometry.Point(-95.37, 29.756), {'region': 'Houston'}), ee.Feature(ee.Geometry.Point(-104.967, 39.729), {'region': 'Denver'}), ee.Feature(ee.Geometry.Point(-80.1918, 25.7617), {'region': 'Miami'}), ee.Feature(ee.Geometry.Point(-84.3859, 33.779), {'region': 'Atlanta'}), ee.Feature(ee.Geometry.Point(-93.2628, 44.9793), {'region': 'Minneapolis'}) ]) {'region': 'Los Angeles'}), ee.Feature(ee.Geometry.Point(-122.4194, 37.7749), {'region': 'San Francisco'}), ee.Feature(ee.Geometry.Point(-87.6298, 41.8781), {'region': 'Chicago'}), ee.Feature(ee.Geometry.Point(-118.2437, 34.0522), {'region': 'Houston'}), ee.Feature(ee.Geometry.Point(-122.3321, 47.6062), {'region': 'Denver'}), ee.Feature(ee.Geometry.Point(-80.1918, 25.7617), {'region': 'Miami'}), ee.Feature(ee.Geometry.Point(-104.9903, 39.7392), {'region': 'Atlanta'}), ee.Feature(ee.Geometry.Point(-97.4384, 38.9139), {'region': 'Minneapolis'}) ])

APoulad commented 2 months ago

It seems to happen with my date range no matter which geometries I use in the FeatureCollection. The following is my code:

# get a collection of point geometries corresponding to metro regions

geometries = ee.FeatureCollection([
    ee.Feature(ee.Geometry.Point(-74.0059, 40.7127),
              {'region': 'New York City'}),
    ee.Feature(ee.Geometry.Point(-71.0567, 42.3408),
               {'region': 'Boston'})
])

# obtain level2 admin units encompassing these selected geometries
# and copy over the properties

adminUnits=ee.FeatureCollection(
   'FAO/GAUL_SIMPLIFIED_500m/2015/level2')

adminSelect=adminUnits.filterBounds(geometries)

def copyGeometryProps(index):
    source = ee.Feature(geometries.toList(geometries.size()).get(index))
    dest = ee.Feature(adminSelect.toList(adminSelect.size()).get(index))
    ftr = dest.copyProperties(source)
    return ftr

seq = ee.List.sequence(0, adminSelect.size().subtract(1))
adminSelect = ee.FeatureCollection(seq.map(copyGeometryProps))
display(adminSelect)

# get gridded population data and clip to selected level2 admin units

population=ee.ImageCollection(
       'CIESIN/GPWv411/GPW_Population_Count') \
        .filter(ee.Filter.calendarRange(2020, 2020, 'year')) \
        .mean()

populationClipped = population.clipToCollection(adminSelect)

# get start and end dates for sentinel 5P data

startDate = ee.Date('2018-07-1')
endDate = ee.Date('2021-06-15')

ndays = endDate.difference(startDate, 'days')

# return a function for generating the sequence of dates for which the image data is required

def create_date_list_fn(startDate):
  def date_list_fn(days):
    return startDate.advance(days, 'days')
  return date_list_fn

# generate the list of dates

date_list_fn = create_date_list_fn(startDate)
list_of_dates = ee.List.sequence(0, ndays, 1).map(date_list_fn)

# mask cloud fraction for filtering sentinel 5P images

CLOUD_MASK_FRACTION = 0.3 # You can play around with this value.
def maskClouds(image):
   cf = image.select('cloud_fraction')
   mask=cf.lte(CLOUD_MASK_FRACTION)
   # Handle cases where the mask might result in an empty image
   masked_image = image.updateMask(mask).copyProperties(image)
   # Check if the masked image has any bands
   if isinstance(masked_image, ee.Image):
       # Check if the masked image has any bands
       if masked_image.bandNames().size().gt(0):
           return masked_image
       else:
           # Return the original image if masking removes all bands
           print('Warning: Cloud mask removed all bands. Returning original image.')
           return image
   else:
       print('Warning: Masking resulted in a non-image object. Returning original image.')
       return image

# obtain the median composite over each day of the daily set of sentinel-5p images

def image_mediancomposite_by_date(date):
    return ee.ImageCollection('COPERNICUS/S5P/OFFL/L3_NO2')\
        .filterDate(ee.Date(date), ee.Date(date).advance(1, 'day'))\
        .map(maskClouds) \
        .select('tropospheric_NO2_column_number_density')\
        .median()\
        .set('system:time_start', ee.Date(date).millis())

# generated the time series of daily median composited images over the sequence of required dates

no2 = ee.ImageCollection(
      ee.List.sequence(0, ndays, 1)\
      .map(date_list_fn)\
      .map(image_mediancomposite_by_date)
    )

def getConc(img):
    return img.reduceRegions(
          reducer=ee.Reducer.mean(),
          collection = adminSelect,
          scale = 7000
    ).map(lambda f: f.set('conc', f.get('mean'))) \
     .map(lambda f: f.set('DATE', img.date().format('YYYY-MM-dd'))) \
     .map(lambda f: f.set('DOW', img.date().format('E'))) \
     .map(lambda f: f.set('DOY', img.date().getRelative('day', 'year')))

no2Mean = no2.map(getConc).flatten().filter(ee.Filter.notNull(['conc']))

def fc_to_dict(fc):
  prop_names = fc.first().propertyNames()
  prop_lists = fc.reduceColumns(
      reducer=ee.Reducer.toList().repeat(prop_names.size()),
      selectors=prop_names).get('list')

  return ee.Dictionary.fromLists(prop_names, prop_lists)

NO2_mean_dict = fc_to_dict(no2Mean).getInfo()

NO2_mean_df = pd.DataFrame(NO2_mean_dict)

display(NO2_mean_df)
rameshnatarajanus commented 2 months ago

@APoulad I've seen this and not sure if it an issue. I guess you are looking for bands in the masked image, but the mask is active over all bands so the bands may be dropped if the mask is activated. I think the check

no2.map(getConc).flatten().filter(ee.Filter.notNull(['conc']))

should handle the cases where the band values are null. Although I am a bit concerned in that case that the corresponding dates will also be deleted from the final dataframe but we don't seem to have run into that problem.

Your fix is actually creating an issue since the masked image which should have no band information at all is getting replaced by the original image which would not be a correct outcome of the masking. ...

rameshnatarajanus commented 2 months ago

In summary @APoulad I don't think your solution would work as it would basically revert the cloud mask filter. The more reasonable approach would be to handle the nulls (missing bands) in the result set. Lets move this to a low-priority for now but eventually @Ori Bach might want to take a closer look as he owns the issue of api access to the data.

rameshnatarajanus commented 1 month ago

I finally found a solution for the images without bands. The best way to handle it is to filter out the images without the relevant band after the cloud_mask_filter is applied.

This allows reduceRegions to work without a hitch. Unfortunately this leaves a gap in the sequence of dates for which we get the mean concentrations. This is a "missing value" in the time series and must be recorded in the final pandas dataframe.

A follow up issue is how to deal with missing values in the training data for forecasting ..

I have updated the code in the demo notebook and will reassign the task to Ori.

rameshnatarajanus commented 1 month ago

This is resolved in a demo, and Ori has incorporated into the server side code. If the image is completely masked nothing is returning and the client side is responsible for filling in the missing days.