Closed ircwaves closed 5 years ago
So for now I'm working in private so I can rebase as-needed.
@bhbraswell, whenever we want to start collaborating, I can push, and if you want to see my commits sooner please just let me know.
In the mean time, here's the rough plan I'm working from at the moment:
std_process.py
.At some point in this process I think it'd be best for someone with scientific training to inspect some generated raster files, like products and exports, but I'm not sure when would be best. Computed stats look okay so far (eg std dev & mean only vary by a little), but obviously my eyes don't see as much as either of yours would.
Another todo: Need to fix landsat bqashadow process sys test after we merge dev into the python 3 branch, because that product has been restored in the dev branch but was deleted entirely in the python 3 branch.
@ags-tolson I'd be glad to begin looking at/ informally testing whenever you'd like to push your branch.
Okay, the branch is named 510-py3-port
; it has commits atop rob-upgrade
. Here's the state of it:
gips_process
tests) now run under python 3 + gippy 1.0.gips_process
system tests generally run and some of them pass their tests.Drivers whose process system tests pass:
These drivers are broken in their process system tests:
merra -- the whole suite is different basically (no diff handy)
s2 easy to fix with --record:
cfmask, evi-toa, and cloudmask
s2 bustified, generally with excessive variation in mean/stddev or other values like data type:
<Function t_process[sentinel2-crcm]>
<Function t_process[sentinel2-mtci-toa]>
<Function t_process[sentinel2-mtci]>
<Function t_process[sentinel2-s2rep-toa]>
<Function t_process[sentinel2-s2rep]>
hls
<Function t_process[hls-msavi2]>
saved new expectation but variations a bit big:
- ' Minimum=0.000, Maximum=10160.000, Mean=2014.882, StdDev=590.982',
+ ' Minimum=0.000, Maximum=10160.000, Mean=1972.869, StdDev=651.815',
Also print
is a bad idea for code intended to act like a library, but verbose_out
has no default verbosity level and doesn't let you specify arguments conveniently like print()
, so I wrote vprint()
as an alternative.
If you want to observe these changes in output, you can build a container suitable for running the test suite with indigo-ci.docker
, provided the needed parent container is named indigo-gips
(ie use the Dockerfile
you added for that). gips/test/README.md
tells how to configure & run the tests. I can forward you the needed assets to save you from fetching them; let me know what driver you want to start with.
I can also send along newly-generated product files for your perusal, too.
Thanks @ags-tolson. I'll look at MSAVI2 and MERRA, and also HLS which actually would be a priority for me personally. Just a question, instead of gips-indigo
, could we make a generic container than anyone could use?
I think we could and should. I'm not sure when the best moment to sort containerization would be though.
An update on system tests in the dev branch + gippy 0.3.11: The CI pipeline passes, and for process tests, the passing drivers are hls, modis, merra, most of landsat, and some of sentinel2. The landsat problem is minor, and the fixes in place in the 510 branch seem trustworthy, so that just leaves sentinel2.
Sentinel2's problematic products under dev + gippy 0.3.11 are mostly the same as for the gippy 1.0 branch: crcm, s2rep, and mtci. Meanwhile s2rep-toa and mtci-toa pass under gippy 0.3.11 and fail under gippy 1.0.
crcm deviates from the expectation but resembles the 0.3.11 outcome, and I'm told it looks okay graphically, which makes me think it's okay to accept it as-is (maybe just a stale expectation):
# sys test expectation:
- ' Minimum--8561.000, Maximum-32767.000, Mean-2954.280, StdDev-3506.046',
# dev branch under gippy 0.3.11 outcome:
+ ' Minimum=-8652.000, Maximum=32767.000, Mean=2375.639, StdDev=3342.788',
# branch 510-py3-port under gippy 1.0 outcome:
+ ' Minimum--8624.000, Maximum-32767.000, Mean-2395.565, StdDev-3341.763',
mtci is acting like it got pushed up to its maximum somehow, maybe a problem with the numpy faff (it's computed manually, not a gippy.algorithms.indices
call):
# sys test expectation:
- ' Minimum=-30000.000, Maximum=29998.000, Mean=12443.862, StdDev=11332.215',
# dev branch under gippy 0.3.11 outcome:
+ ' Minimum=-30000.000, Maximum=29998.000, Mean=12388.818, StdDev=11309.554',
# branch 510-py3-port under gippy 1.0 outcome:
+ ' Minimum=-30000.000, Maximum=32767.000, Mean=32742.273, StdDev=1026.779',
mtci-toa looks very different from its expectation, but weirdly similar to the surface mtci's expectation & gippy 0.3.11 outcome:
# sys test expectation:
- ' Minimum--7.000, Maximum-6.000, Mean--0.635, StdDev-5.246',
# branch 510-py3-port under gippy 1.0 outcome:
+ ' Minimum=-30000.000, Maximum=29999.000, Mean=12659.624, StdDev=12332.159',
s2rep is similarly showing a huge blow-up to the max and is similarly computed by the driver, not gippy:
# sys test expectation:
- ' Minimum=-1.000, Maximum=17498.000, Mean=8152.291, StdDev=1299.719',
# dev branch under gippy 0.3.11 outcome:
+ ' Minimum=-1.000, Maximum=17498.000, Mean=8163.189, StdDev=1364.628',
# branch 510-py3-port under gippy 1.0 outcome:
+ ' Description = 1',
+ ' Minimum=-10000.000, Maximum=32767.000, Mean=32689.483, StdDev=1458.024',
s2rep-toa looks very different from its expectation but in the ballpark of the surface s2rep expectation and gippy 0.3.11 outcome, which makes me think the expectation is wrong:
# sys test expectation:
- ' Minimum=-911.000, Maximum=1100.000, Mean=301.864, StdDev=724.321',
# branch 510-py3-port under gippy 1.0 outcome:
+ ' Minimum=1.000, Maximum=17499.000, Mean=8317.195, StdDev=1353.935',
Due to the way sentinel-2 works I suppose I'll look at their surface versions first, since for sentinel-2 it represents less processing.
So this is odd; I have some memory of having to work back from the s2 ref product, which is really just opening the asset, to get to the ref-toa product, but looking at the code, I can see no evidence of this. Apparently I'm mistaken? Anyway I'm starting with mtci-toa instead since that's closer to the source.
So, the idea there is that ESA provides, ref-toa
, and coeffs to get back to rad-toa
. So, once we convert back to rad-toa
, we apply the 6s based atmospheric correction coeffs to get to ref
.
For the py3 + gippy 1 version of mtci-toa, in the histogram, there's strange spikes near the 0.5 marks of the histogram (see especially spiky spike at 0.0):
But otherwise I guess it looks good, and close to atmo-corrected mtci in any case. Let me know if this isn't an acceptable artifact kind of situation.
That looks like a possible quantization effect, but I would defer to @bziniti on whether that is problematic. I believe she's the only one who has utilized MTCI.
That looks like a possible quantization effect, but I would defer to @bziniti on whether that is problematic. I believe she's the only one who has utilized MTCI.
I used MTCI once for a forest defoliation mapping project. I'm not sure I understand what the problem is. @lledoux may be more familiar than I. Or maybe this will help? https://www.sciencedirect.com/science/article/pii/S092427161300107X
I would tend to agree with @ircwaves , as it's too much of a coincidence that those spikes happen at every 0.5 step; though I am unsure as to whether it may cause issues later on if/when the data are used. Like @bziniti , I only used this product for mapping deforestation (a while ago). Sorry I'm not more help! Perhaps Torbick may be able to provide some input?
So, I'd say that MTCI is good for now, but maybe file an issue to look at the level of quantization being employed.
So merra processing went okay; I'm pivoting to export tests.
Modis index exports are showing a strange geolocation thing; going from 71d12\'50.33"W
to 71d12\'50.24"W
is a distance of as much as 0.0015 miles, if I'm doing the math right, which seems significant enough to matter is like 2 meters. Does that matter? I don't think so?
____________________________ t_project[modis-ndvi] _____________________________
E 'Origin = (1777848.52615334,4934179.33097106)',
E - 'Pixel Size = (100.00000000,-100.00000000)',
E + 'Pixel Size = (99.88880776,-99.93806065)',
(lots of new metadata, but missing):
E - ' SourceFiles=h12v04_2012337_MCD_ndvi',
(possibly relevant for geolocation issue):
E + ' GRINGPOINTLATITUDE.1=39.78578782, 49.99719181, 50.07541801, 39.84112776',
E + ' GRINGPOINTLONGITUDE.1=-78.20833299, -93.38216574, -77.75056839, '
E + '-65.07807811',
E + ' GRINGPOINTSEQUENCENO.1=1, 2, 3, 4',
(geolocation issue):
E 'Corner Coordinates:',
E - 'Upper Left ( 1777848.526, 4934179.331) ( 71d12\'50.33"W, 43d27\'22.98"N)',
E ? ^^
E + 'Upper Left ( 1777848.526, 4934179.331) ( 71d12\'50.24"W, 43d27\'22.98"N)',
E ? ^^
E - 'Lower Left ( 1777848.526, 4868579.331) ( 71d21\'48.67"W, 42d53\'16.00"N)',
E ? ^^ -- - ^ ^^
E + 'Lower Left ( 1777848.526, 4868619.963) ( 71d21\'48.26"W, 42d53\'17.27"N)',
E ? ^^ ++ + ^ ^^
E - 'Upper Right ( 1825148.526, 4934179.331) ( 70d39\' 9.32"W, 43d22\'31.31"N)',
E ? ^^^ ^ - ^^ ^^ ^^
E + 'Upper Right ( 1825095.932, 4934179.331) ( 70d39\'11.44"W, 43d22\'31.64"N)',
E ? ^^^ ^^ ^^ ^^ ^^
E - 'Lower Right ( 1825148.526, 4868579.331) ( 70d48\'24.75"W, 42d48\'30.03"N)',
E ? ^^^ ^ - ^^ -- ^ - ^ ^^
E + 'Lower Right ( 1825095.932, 4868619.963) ( 70d48\'26.53"W, 42d48\'31.62"N)',
E ? ^^^ ^^ ^^ ++ ^ + ^ ^^
E - 'Center ( 1801498.526, 4901379.331) ( 71d 0\'34.22"W, 43d 7\'56.48"N)',
E ? ----- ^ ^^^ ^ ^^ ^ ^^
E + 'Center ( 1801472.229, 4901399.647) ( 71d 0\'35.07"W, 43d 7\'57.27"N)',
E ? +++++ ^ ^^^ ^ ^^ ^ ^^
(content issue):
E 'Band 1 Block=473x8 Type=Int16, ColorInterp=Gray',
E - ' Description = ndvi',
E - ' Minimum=-9125.000, Maximum=8199.000, Mean=5985.630, StdDev=1326.924',
E ? ^^^ ^ ^ -- ^^ ^^^
E + ' Minimum=-9068.000, Maximum=8191.000, Mean=5980.926, StdDev=1317.353',
E ? ^^^ ^ ^ ++ ^^ ^^^
E ' NoData Value=-32768',
E - ' Unit Type: other',
The piece that jumps out there to me is the pixel size being different. That shouldn't have changed.
Not sure how many spots it comes up in, but my guess is that the issue is here:
https://github.com/gipit/gippy/blob/master/GIP/GeoAlgorithms.cpp#L254
where cookie_cutter
just grabs the bounding box from the GeoFeature, and ignores the resolution that is provided to cookie_cutter
. Then that code just pulls the bbox from this code:
https://github.com/gipit/gippy/blob/master/GIP/GeoResource.cpp#L101-L103
bounding box being used to compute the resolution being stuffed into the image, where it used to be explicitly passed in:
https://github.com/Applied-GeoSolutions/gippy/blob/master/GIP/GeoAlgorithms.cpp#L394-L401
So, I believe that there is a bug in this gippy use-case, but it also points out again that we need to add ALL_TOUCHED
support to gippy 1.0 cookie cutter.
Okay, MRPR #513 is up for the another round of effort. As of now:
Thus, fetch & process for most drivers has been verified under python 3 + gippy 1.0, plus some random other testing by our unit & integration tests. Next steps (updated Fri 20 sep --T):
gips/scripts/
that haven't been exercised (and tiles.py
, which it might be a good moment to remove?): I think just export & stats need exercise, export_meta.py
is being removed, and tiles.py
is a candidate for removal.A note for containerization, need to add apt-get install python-dev
(2.7) even though we're going python3, because gippy freaks out about needing python2.7/Python.h
:
bash$ pip3 install gippy
[ ... snip ... ]
gippy/gippy_wrap.cpp:5040:34: fatal error: python2.7/Python.h: No such file or directory
compilation terminated.
bash$ pip3 install gippy ; echo "exit status: $?" [ ... snip ... ] Collecting gippy Using cached https://files.pythonhosted.org/packages/c4/58/c79b0213b3a601a5d921f00fc1b3b443edd4c31882d7f69271c793125d9c/gippy-1.0.3.tar.gz Requirement already satisfied: numpy>=1.9 in /usr/local/lib/python3.5/dist-packages (from gippy) (1.17.1) Building wheels for collected packages: gippy Building wheel for gippy (setup.py) ... done Created wheel for gippy: filename=gippy-1.0.3-cp35-cp35m-linux_x86_64.whl size=7543861 sha256=9c9ffd98059112d851be997e84aa702a10272f37f0a2017e53eb831cfd8cec60 Stored in directory: /root/.cache/pip/wheels/e2/0c/f9/1403921925e62f69cb4c23f3b579a4296de183c6f6aff07a95 Successfully built gippy Installing collected packages: gippy Successfully installed gippy-1.0.3 exit status: 0
As of now that PR (#513) is mature enough to pass the CI test suite (unit + int + sys), which means technically it can be merged into dev if it passes code review. @ircwaves note that as you recommended mapreduce.py
hasn't been removed in this branch.
As we consider merging, the key issues I know of are:
gips_export
can't work perfectly due to: https://github.com/gipit/gippy/issues/172I'm going to attempt a local dev merge, see how miserable it is.
That's great to hear! My current thinking is that we move forward with dev @ python3+gippy1, and keep master at python2.7 until dependent apps have been ported to python 3+gippy1. Raise any flags for anyone?
Really impressive @ags-tolson. Definitely doesn't raise any warning flags for me.
Merge complete + test suite passes; if dev doesn't move much, the 510 branch head now contains dev as well. There were several merge conflicts but hopefully test passage is a good sign.
A bunch of coreg changes came in, but I don't have a test suite for coreg generally. I attempted to follow @bhbraswell's pattern of changing calls to match gippy but it's possible I didn't catch everything. I didn't do manual testing in any case because I don't know enough about coreg to know if it worked or changed behavior.
Likewise there are more dockerfiles than before. There's going to be a little layer cake with Dockerfile
on the bottom:
Dockerfile
, basic gips containerdocker/gips-ci.docker
builds atop it and runs the CI suitedocker/gips-test-full.docker
builds atop the CI docker image and lets you run every non-skipped test (mainly adds sixs
; note I punted on adding in earthdata creds etc since that's never run using automation right now).Suggest adding coreg stuff (ortho
?) to gips-test-full.docker
in the near future and instrumenting coreg code with tests. Suggest that effort should follow this one to keep the bloat down; we can't have everything right away with something this large.
Okay, the mergeable pushed branch is 510-py3-port
. I'll issue a PR shortly; it's big so I'll suggest a procedure for managing it when I do. Other notes:
- ' Minimum=0.000, Maximum=10160.000, Mean=2014.882, StdDev=590.982',
+ ' Minimum=0.000, Maximum=10160.000, Mean=1972.869, StdDev=651.815',
gips_export --rastermask
has an a priori bug we're tracking internally.gips_mask
doesn't work a priori; another bug filed internally.git diff dev...master
yielded no diffdev-py2
) as a safe haven because:install.sh
isn't yet ready; however it can be done post facto given the safe haven (I can work on it while I'm waiting on the PR review).
Thanks @ags-tolson and @bhbraswell for getting started on this. We anticipate needing to have some discussion on some facets of this. We're off to a good start. https://pythonclock.org/
(also see #376)