fdetsch / MODIS

Download and processing framework for MODIS imagery. The package provides automated access to the global online data archives LP DAAC, LAADS and NSIDC as well as processing capabilities such as file conversion, mosaicking, subsetting and time series.
Other
57 stars 27 forks source link

water and fill values in MOD44B #39

Closed MagicForrest closed 6 years ago

MagicForrest commented 6 years ago

Thanks again for the great package, I really like it. But unfortunately I have found an issue that might really hinder its usefulness, at least for me. Of course I might have missed something....

When using GDAL on product MOD44B (and this might well affect other products) the codes for water (200) and fill (253) seem to be included in the interpolation as if they were real data. This means that some larger aggregated pixels have values of greater that 100% for tree cover (or bare cover or whatever). This is not ideal, but they can be trimmed out relatively easily (at the loss of some pixels from the aggregated data) . You can see this in the attached screen shot. Note the scale in ncview, all the pixels that are not blue are over 100. This domain in question is Turkey by the way. You can probably just about make out the coastline.

screenshot from 2018-02-01 14-25-34

More concerning is the fact that this indicates that some pixels may have apparently valid values (in the range 0-100) but actually they include contributions of 200 or 253. This is impossible to identify.

I have tried with both "average" and "bilinear" interpolation and I see that same thing.

Obviously this is kind of a problem. As far as I can tell it needs to be solved in the MODIS package by masking out the water and fill pixels at the GDAL command stage. This should be possible using an option, I think. Maybe the -srcnodata option perhaps.

fdetsch commented 6 years ago

If I understand correctly, all that is required in order to solve this issue is to leave both the input CRS (ie. MODIS Sinusoidal) and pixel size of MOD44B untouched.

I just modified getTile() in such a way that, whenever a country name (passed to maps::map() behind the curtain) is specified as extent, the target CRS of all layers extracted through runGdal() is adopted from MODISoptions() instead of the resulting 'map' object (ie. EPSG:4326), see also 0892cc8bce8f475842dd3981dcc3725752652294. Something like

## devtools::install_github("MatMatt/MODIS", ref = "develop")
library(MODIS)

MODISoptions(outProj = "asIn", pixelSize = "asIn") # default settings

tfs = runGdal("MOD44B", tileH = 21, tileV = 8
              , begin = "2000001", end = "2000366"
              , SDSstring = "1000000", job = "mod44b-test")

## check raster
rst = raster(unlist(tfs))
projection(rst)
# [1] "+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs"

## count values
tbl = table(rst[])
rev(tbl)[1:5]
#    200     88     87     86     85 
# 473186      2     23    111    372

now seems to generate the desired outcome. Note that when working with 'Raster' or 'Spatial' inputs instead of country names, the target CRS will be adopted from the input object. In this case, the object needs to be available in MODIS Sinusoidal ("+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs") in order to suppress reprojection, and hence interpolation issues affecting extracted layers. Here's some sample code for this second approach:

shp = getData(country = "TUR", level = 0, path = tmpDir())
shp = spTransform(shp, CRS("+proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs"))

tfs = runGdal(product = "MOD44B", extent = shp
              , begin = "2015001", end = "2015365"
              , SDSstring = "1000000", job = "mod44b-test")
MagicForrest commented 6 years ago

Hmmm... I can't really comment on the first part because I don't use the package in that way (either specifying tiles or country extents). The second point is more relevant to me, but again not entirely, because it doesn't do any include any interpolation which is very important for my workflow.

My basic task is to compare MODIS data to my simulation outputs. So, for this purpose I want to put the MODIS data exactly on simulation grid (as defined as an R raster). Therefore my target grid (and CRS) is fixed. I appreciate that I could probably transform my target grid to MODIS sinusoidal, do the runGdal(), and then transform back. But that is time consuming, will introduce error and is just generally clumsy.

Maybe I am asking for a lot, but what I really need is to be able to interpolate and project the MODIS data on to any arbitrary grid (assuming it can be defined on an R raster). And it needs to handle MODIS 'out-of-range' values sensibly, ie not just blindly include them in whatever interpolation needs to be done. Maybe this is asking too much for the aims of your package?

But it seems to me that such functionality is almost there, it is just that GDAL is blind to MODIS's 'water' and 'fill' values. If they could be masked out when the GDAL command is run (is assume it is a gdalwarp command in the background which is why I suggested a particular argument) then the holy grail would be achieved!

Does this all make sense? It is hard to explain. We can also chat on the phone if you want.

MagicForrest commented 6 years ago

Hello again. I realise I was a bit demanding and under-appreciative of your last response. Sorry about that.

So I thought more carefully about it, and came to the conclusion that your solution would allow me to download and mosaic the data using your MODIS package which is still useful. But then I would need to re-project and and resample it myself (after changing the water and fill values to NA) to get the data I need.

I then tried to run the example code you posted, but I got an "NA" for the raster projection and different values for counts. Perhaps because of this error message:

Warning messages: 1: In .getCRSfromGridMap4(atts) : cannot process these parts of the CRS: long_name=CRS definition; spatial_ref=PROJCS["unnamed",GEOGCS["unnamed ellipse",DATUM["unknown",SPHEROID["unnamed",6371007.181,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["longitude_of_center",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["Meter",1]]; GeoTransform=3285172.47849608 231.6580305865238 0 1111950.51976652 0 -231.6563582846917 2: In .getCRSfromGridMap4(atts) : cannot create a valid CRS grid_mapping_name; false_easting; false_northing; scale_factor_at_projection_origin; scale_factor_at_central_meridian; standard_parallel; standard_parallel1; standard_parallel2; longitude_of_central_meridian; longitude_of_projection_origin; latitude_of_projection_origin; straight_vertical_longitude_from_pole; longitude_of_prime_meridian; semi_major_axis; inverse_flattening; +proj; +x_0; +y_0; +k_0; +k_0; +lat_1; +lat_1; +lat_2; +lon_0; +lon_0; +lat_0; +lon_0; +lon_0; +a; +rf

fdetsch commented 6 years ago

Merged your pull request with 5533ca57f9348f2c8ee9ea875cad26eb3a63c72e, sorry for the delay. 'maskValue' works like a charm, thanks for that. Until the next CRAN release, this'll be available from 'develop' only. Note that I added your name to the list of contributors in DESCRIPTION and ?runGdal, hopefully that's okay for you.

I'll leave this thread open until we find a solution for passing multiple mask values on to GDAL.

fdetsch commented 6 years ago

Multiple No Data Values are now available from the 'develop' branch. @MagicForrest, could you please try out the following:

# devtools::install_github("MatMatt/MODIS", ref = "develop")
library(MODIS)

lai = runGdal("MCD15A2H", tileH = 21, tileV = 8
              , begin = "2003001", end = "2003008"
              , SDSstring = "010000", job = "mcd15a2h-test"
              , maskValue = 249:254, overwrite = TRUE)

rst = raster(lai[[1]][[1]])
(tbl <- table(rst[]))[order(as.numeric(names(tbl)), decreasing = TRUE)][1:10]
#   70    69    68    67    66    65    64    63    62    61 
#  655   491   851  4107 11045  5398  3704  3555 13975  456

In MCD15A2H, there is a pre-defined No Data Value (255) that is automatically detected by runGdal() plus a couple of fill values (249 to 254) that need to be specified by the user. Note that in the above code chunk, we could also define maskValue = 249:255 - the function would automatically detect that 255 is the 'official' No Data Value, and hence remove it from 'maskValue' to reduce computation time.

For the record, there definitely are fill values in the raw image:

lai_ref = runGdal("MCD15A2H", tileH = 21, tileV = 8
                  , begin = "2003001", end = "2003008"
                  , SDSstring = "010000", job = "mcd15a2h-test"
                  , overwrite = TRUE)

plot(rst_ref <- raster(lai_ref[[1]][[1]]))
(tbl_ref <- table(rst_ref[]))[order(as.numeric(names(tbl_ref)), decreasing = TRUE)][1:10]
#    254    253    250     70     69     68     67     66     65     64 
# 117370  83443   4426    655    491    851   4107  11045   5398   3704
MagicForrest commented 6 years ago

Thanks for picking up the pull request and extending it to multiple no data values :-)

With GTiff it worked like charm! So I would say that it has been 'linux tested'.

However (and this is probably a different issue) with netCDF I got errors about lzw compression and also about the sinusoidal projection. But I still get this issue with the master branch, so it must be unrelated. I'm not sure if it is from my GDAL/netCDF/HDF/(zlib?) software stack, or actually with the MODIS package. I'll look into it some day.

Thanks again! Also, you don't need to list me as a contributor since my contribution has been so minor. I wouldn't want to 'dilute' the credit due to more worthy contributors.

fdetsch commented 6 years ago

Any updates on the netCDF part? I'll probably test it on my machine later this day to see if this is somehow related to MODIS.

fdetsch commented 6 years ago

Closing this issue since multiple mask values are now available. Moved the netCDF discussion to a new issue #52.