Applied-GeoSolutions / gips

Geospatial Image Processing System
GNU General Public License v3.0
17 stars 5 forks source link

idempotence issue with gips_project et al #48

Closed ra-tolson closed 8 years ago

ra-tolson commented 8 years ago

Test demonstrating this problem is here: https://github.com/Applied-GeoSolutions/gips/commit/8a80d454e92f9e6ce17f053bf6a8ffb55e7068af

Here's what it produces on the initial run:

https://github.com/Applied-GeoSolutions/gips/blob/8a80d454e92f9e6ce17f053bf6a8ffb55e7068af/gips/test/data.py#L95

On the second run, three additional files are produced:

{
'0/2012336_MOD-MYD_obstime.tif': 1180170371, 
'0/2012337_MOD-MYD_obstime.tif': 1283853420, 
'0/2012338_MOD-MYD_obstime.tif': -922366135
}

This seems like an error because I would expect repeated runs for a given projection to produce the same output, not add files.

Final data point: The initial run is performed with a data repo that has only source data in it; the second run commences while there are products present, generated on an as-needed basis by the first run. These include .hdf.index files and product .tifs.

ra-tolson commented 8 years ago

As @ircwaves pointed out, these files are duplicates of three others produced in the initial run, named, eg, 0/2012336_MCD_obstime.tif.

ra-tolson commented 8 years ago

Adding sensor = 'MOD-MYD' to the clause for obstime in modisData.process fixes it, causing obstime products in the data repo and any projected output to be named accordingly. I have a partial explanation; I'll post a full one shortly (presuming I can formulate one, heh).

bhbraswell commented 8 years ago

If you take all the other products out of the test, i.e. use "-p obstime", does this behavior still happen?

I'm wondering if there is something that happens at the asset level that is meant to apply to all the products. Then I'm also wondering if we should require that all products that are requested at one time have the same sensor identifier.

Also we should decide what we want sensor to mean.

ra-tolson commented 8 years ago

Okay, here's the explanation I promised: Whenever products are produced, they're given the right source value, namely "MOD-MYD". This is becaue that value is hardcoded into the clause for obstime in modisData.process. But sensor is set to the default of "MCD", which is used to set sensors and filenames on the modisData object. Later the value for sensor used to name the output files.

But on a run with products in the repo, modisData.process is either not called or its loop is never entered. So no sensor value is set for obstime, default or otherwise. Instead, files were found in the repo beforehand, their names parsed and "MOD-MYD" extracted from them for use in output file names.

I think my fix works because it explicitly prevents the default for sensor from ever being used.

A broader issue is that modisData.process is too easy to screw up. It is 621 lines long, including almost 600 lines of cutpasted boilerplate in a long series of if statements. That can't be ideal.

ra-tolson commented 8 years ago

If you take all the other products out of the test, i.e. use "-p obstime", does this behavior still happen?

Looks like it:

(.venv) laptop2:~/src/gips$ cat try.sh 
rm -r data-root && rm -rf testout/* && tar xf data-root.tar && tree data-root
gips_project modis -s gips/test/NHseacoast.shp -d 2012-12-01,2012-12-01 -v 4 --outdir testout --notld -p obstime
gips_project modis -s gips/test/NHseacoast.shp -d 2012-12-01,2012-12-01 -v 4 --outdir testout --notld -p obstime
(.venv) laptop2:~/src/gips$ source try.sh &>/dev/null
(.venv) laptop2:~/src/gips$ ls testout/0/
2012336_MCD_obstime.tif  2012336_MOD-MYD_obstime.tif
ra-tolson commented 8 years ago

Fix is in branch 46-sys-tests.

bhbraswell commented 8 years ago

Good detective work. I think not having a default sensor makes sense. Agree about the process block, and this sort of applies to the drivers in general (about boilerplate code). This wasn't as apparent when there were only two drivers. Now we have a better sample size and could probably break the driver down a bit.

ra-tolson commented 8 years ago

Okay, default removed.

ra-tolson commented 8 years ago

This merged awhile ago; closing.