OCHA-DAP / ds-raster-pipelines

1 stars 0 forks source link

Check grid alignment between SEAS5 2024 data vs historical data #14

Open hannahker opened 4 weeks ago

hannahker commented 4 weeks ago

On analysis with xarrray in Python, seemed to be grid alignment issues that need further investigation.

For example, if I stack COGs from both MARS and AWS outputs, I get an output something like this (clipped to Haiti):

Screenshot 2024-08-19 at 3 30 15 PM
hannahker commented 3 weeks ago

See @zackarno's work here: https://rpubs.com/zackarno/1204437. Odd since the outputs above seem incongruous with what Zack found!

hannahker commented 3 weeks ago

See notebook here: https://github.com/OCHA-DAP/ds-raster-pipelines/blob/seas5-grid-alignment/examples/mars_aws_seas5_grid.md

zackarno commented 1 week ago

interesting! I can't actually see the misalignment in the image screenshot in this issue, perhaps if pixel centroids were converted to points it would be more clear. However, it seems surprising that it would even be visible when when the difference is at the 10e-13 decimal place as you noted in your PR (i guess if you zoom in far enough you can eventually see it?).

Nonetheless, nice job figuring out the rounding issue. This doesn't seem to have any effect in R and I think relates to the fact that R and python basically handle floating point precision slightly differently.

in R if we were to run:

all.equal(r_mars_x_coordinates,r_aws_x_coordinates)
all.equal(r_mars_y_coordinates,r_aws_y_coordinates)

They would both return TRUE & I guess the terra() raster package recognizes this and that they are actually meant to be the same which allow the grids to be stacked/aligned automagically.

I think rounding to 4 digits is fine, but as this is a 0.4/0.4 grid I think it would actually make more sense to round to 1 decimal, i think the rest of the decimals don't actually make the data more accurate, but could rather reflect some interpolation/processing side effect that we don't need to worry about. This often happens on raster operations like reprojection where for example a 0.5 degree raster will get transformed to something like: 0.49999999999999 degrees - in these cases i believe it's standard practice just round to the known resolution (i.e 0.5 in this made up example). Perhaps if you do this and change the float type it could also make the files leaner and processing more efficient?