Closed sullamenashe closed 6 years ago
Another thing to note is that if I run the exact same command on the exact same node that gave me an error but with fetching from the default source (ESA), the fetch runs all the way through without an error.
I've not been able to duplicate this error locally. I will check on an instance.
We suspect, but aren't sure yet, that this apparent error may be due to there being no valid pixels in the requested domain (an ARD tile).
When run locally, the export runs to completion but the files contain all zeros as you would expect (if you knew that the requested are contained no data).
I doubt that it is related to valid pix. All of the work prior to mosaicking is done on the source tiles, and hence has no relation to the geometry of the request.
Here is the test I ran locally. I only added --%tile
to save processing many scenes since the fundamental error that set off the harmonics of errors was in producing the ref
product for a scene:
(venv) icooke@rio:~/src/gips$ gips_export sentinel2 --fetch --%tile 50 -d 2018-06-12 -s /titan/data/vector/ARD/conus_ard_grid.shp -w "h=24 and v=9" -v4 --outdir /tmp --notld --res 30 30 -p ref
GIPS Data Export (v0.12.0-dev)
Retrieving inventory for site conus_ard_grid-471 for date range 2018-06-12 - 2018-06-12 (days 1-366)
Processing [ref] on 1 dates (1 files)
0:00:00.000027: Starting processing for this temporal-spatial unit
0:00:00.000239: Start VRT for ref-toa image
0...10...20...30...40...50...60...70...80...90...100 - done.
0:01:27.635320: Finished VRT for ref-toa image
0:01:27.635526: Starting reversion to TOA radiance.
0:01:34.222814: TOA radiance reversion factor for BLUE (band 1): 0.570374984025
0:01:34.222993: TOA radiance reversion factor for GREEN (band 2): 0.530652586019
0:01:34.223046: TOA radiance reversion factor for RED (band 3): 0.44008388869
0:01:34.223097: TOA radiance reversion factor for REDEDGE1 (band 4): 0.414640365583
0:01:34.223136: TOA radiance reversion factor for REDEDGE2 (band 5): 0.374757890505
0:01:34.223185: TOA radiance reversion factor for REDEDGE3 (band 6): 0.338222481495
0:01:34.223232: TOA radiance reversion factor for NIR (band 7): 0.30316560254
0:01:34.223268: TOA radiance reversion factor for REDEDGE4 (band 8): 0.278045144071
0:01:34.223314: TOA radiance reversion factor for SWIR1 (band 9): 0.0714787787677
0:01:34.223360: TOA radiance reversion factor for SWIR2 (band 10): 0.0248119462924
0:01:34.223432: Computing atmospheric corrections for surface reflectance
Generating atmospheric correction object.
Running atmospheric model (6S)
Retrieving inventory for site tiles for date range 2018-06-12 - 2018-06-12 (days 1-366)
MOD08_D3.A2018163.061.2018166192543.hdf -> /data2/aod6/tiles/2018/163/MOD08_D3.A2018163.061.2018166192543.hdf
1 files (1 links) from /data2/aod6/stage added to archive in 0:00:00.011310
MOD08_D3.A2018163.061.2018166192543[Aerosol Optical Thickness at 0.55 microns for both Ocean (best) and Land (corrected): Mean]: read (95,49)-(97,51) in 0.00380235 seconds
lta[]: read (95,49)-(97,51) in 0.000122812 seconds
lta[]: read (95,49)-(97,51) in 4.049e-06 seconds
AOD: LTA-Daily = 0.0908945, 0.070848
AOD: Source = Weighted estimate using MODIS LTA values Value = 0.0908945228255
Band T Lu Ld
BLUE: 0.993 35.15 506.84
GREEN: 0.966 18.88 486.03
RED: 0.977 8.57 415.94
REDEDGE1: 0.965 6.44 381.53
REDEDGE2: 0.963 5.03 354.37
REDEDGE3: 0.991 3.88 334.83
NIR: 0.945 2.68 280.63
REDEDGE4: 0.999 2.31 278.23
SWIR1: 0.978 0.11 68.68
SWIR2: 0.945 0.01 22.62
Ran atmospheric model in 0:00:21.875784
0:01:56.388670: Starting on standard product processing
0:01:56.389002: Starting ref processing
17SKD_2018163_ref-toa[BLUE]: Processing in 2 chunks
17SKD_2018163_ref-toa[BLUE]: read (0,0)-(5489,3054) in 26.2489 seconds
17SKD_2018163_S2A_ref[BLUE]: Writing (0.0001x + 0)
17SKD_2018163_ref-toa[BLUE]: read (0,3055)-(5489,5489) in 16.877 seconds
17SKD_2018163_ref-toa[GREEN]: Processing in 2 chunks
17SKD_2018163_ref-toa[GREEN]: read (0,0)-(5489,3054) in 23.824 seconds
17SKD_2018163_S2A_ref[GREEN]: Writing (0.0001x + 0)
17SKD_2018163_ref-toa[GREEN]: read (0,3055)-(5489,5489) in 16.6263 seconds
17SKD_2018163_ref-toa[RED]: Processing in 2 chunks
17SKD_2018163_ref-toa[RED]: read (0,0)-(5489,3054) in 26.3742 seconds
17SKD_2018163_S2A_ref[RED]: Writing (0.0001x + 0)
17SKD_2018163_ref-toa[RED]: read (0,3055)-(5489,5489) in 16.7657 seconds
17SKD_2018163_ref-toa[REDEDGE1]: Processing in 2 chunks
17SKD_2018163_ref-toa[REDEDGE1]: read (0,0)-(5489,3054) in 11.683 seconds
17SKD_2018163_S2A_ref[REDEDGE1]: Writing (0.0001x + 0)
17SKD_2018163_ref-toa[REDEDGE1]: read (0,3055)-(5489,5489) in 5.17684 seconds
17SKD_2018163_ref-toa[REDEDGE2]: Processing in 2 chunks
17SKD_2018163_ref-toa[REDEDGE2]: read (0,0)-(5489,3054) in 13.0566 seconds
17SKD_2018163_S2A_ref[REDEDGE2]: Writing (0.0001x + 0)
17SKD_2018163_ref-toa[REDEDGE2]: read (0,3055)-(5489,5489) in 5.03282 seconds
17SKD_2018163_ref-toa[REDEDGE3]: Processing in 2 chunks
17SKD_2018163_ref-toa[REDEDGE3]: read (0,0)-(5489,3054) in 12.0621 seconds
17SKD_2018163_S2A_ref[REDEDGE3]: Writing (0.0001x + 0)
17SKD_2018163_ref-toa[REDEDGE3]: read (0,3055)-(5489,5489) in 5.38139 seconds
17SKD_2018163_ref-toa[NIR]: Processing in 2 chunks
17SKD_2018163_ref-toa[NIR]: read (0,0)-(5489,3054) in 25.435 seconds
17SKD_2018163_S2A_ref[NIR]: Writing (0.0001x + 0)
17SKD_2018163_ref-toa[NIR]: read (0,3055)-(5489,5489) in 16.651 seconds
17SKD_2018163_ref-toa[REDEDGE4]: Processing in 2 chunks
17SKD_2018163_ref-toa[REDEDGE4]: read (0,0)-(5489,3054) in 12.1904 seconds
17SKD_2018163_S2A_ref[REDEDGE4]: Writing (0.0001x + 0)
17SKD_2018163_ref-toa[REDEDGE4]: read (0,3055)-(5489,5489) in 5.03619 seconds
17SKD_2018163_ref-toa[SWIR1]: Processing in 2 chunks
17SKD_2018163_ref-toa[SWIR1]: read (0,0)-(5489,3054) in 12.2587 seconds
17SKD_2018163_S2A_ref[SWIR1]: Writing (0.0001x + 0)
17SKD_2018163_ref-toa[SWIR1]: read (0,3055)-(5489,5489) in 4.40203 seconds
17SKD_2018163_ref-toa[SWIR2]: Processing in 2 chunks
17SKD_2018163_ref-toa[SWIR2]: read (0,0)-(5489,3054) in 12.2233 seconds
17SKD_2018163_S2A_ref[SWIR2]: Writing (0.0001x + 0)
17SKD_2018163_ref-toa[SWIR2]: read (0,3055)-(5489,5489) in 4.38355 seconds
0:06:34.783671: Finished ref processing
0:06:35.155304: Completed standard product processing
0:06:35.252181: Processing complete for this spatial-temporal unit
Processing completed in 0:06:35.259536
Creating mosaic project /tmp/471
Dates: 1 dates (2018-06-12 - 2018-06-12)
Products: ref
GIPPY: CookieCutter (1 files) - /tmp/471/mosaicfZCfyX/2018163_S2A_ref.tif
17SKD_2018163_S2A_ref warping into 2018163_S2A_ref 0...10...20...30...40...50...60...70...80...90...100 - done.
2018-06-12: created project files for 1 tiles in 0:00:12.411108
Completed mosaic project in 0:00:12.669752
/tmp/471/2018163_S2A_ref.tif
/tmp/471/2018163_S2A_ndti-toa.tif
DATE Coverage Product
2018
163 ndti-toa ref
An update from me is that I got the command to work fine with a tile with valid data in it. For example this case works: gips_export sentinel2 -p ref -d 2018-06-12 -s /archive/vector/conus_ard_grid.shp -w "h=14 and v=10" -v4 --outdir /mnt/storage/tmp --notld --res 30 30 --fetch
So this seems to confirm that the first tile I tested on was problematic.
Actually, I was able to reproduce your error on AWS in a fresh setup that didn't have the ancillary AOD-composites files in place. And that's the only way that this error would get noisy enough to be caught. Here's what the error looks like in the case where you have emplaced the composites in their home:
[clip]
Running atmospheric model (6S)
Retrieving inventory for site tiles for date range 2018-06-12 - 2018-06-12 (days 1-366)
No files found; nothing to archive.
--> Unrecognizable file: HDF4_EOS:EOS_GRID:"/archive/aod/tiles/2018/163/MOD08_D3.A2018163.061.2018166192543.hdf":mod08:Aerosol_Optical_Depth_Land_Ocean_Mean
MOD08_D3.A2018163.061.2018166192543[Aerosol Optical Thickness at 0.55 microns for both Ocean (best) and Land (corrected): Mean]: read (94,48)-(96,50) in 0.00182403 seconds
lta[]: read (94,48)-(96,50) in 2.9126e-05 seconds
lta[]: read (94,48)-(96,50) in 2.829e-06 seconds
AOD: LTA-Daily = 0.105261, 0.114586
AOD: Source = Weighted estimate using MODIS LTA values Value = 0.105261108283
[clip]
This behavior has been historically desirable, But may be different for NRT processing.
I think the long(er) term fix is to look at other AOD sources that have lower latency, or to use LTA(D) values, and have a reproc job that comes through and reprocesses data as "real" AOD estimates become available.
Oh, and I just reproduced the stream to short error. I'd suggest trying to run with the following change. I flushed my instance's archive, applied the change, and re-ran w/o issue. Hopefully it is just an intermittent issue when using the /vsicurl_streaming
access. If this does eliminate that error, then I would attribute it to GIPS not really operating in a "streaming" mode. I was going to change that line anyway, for that reason.
diff --git a/gips/data/core.py b/gips/data/core.py
index b2fdbaa..5b8884e 100644
--- a/gips/data/core.py
+++ b/gips/data/core.py
@@ -85,7 +85,7 @@ class GoogleStorageMixin(object):
return cls._gs_object_url_base.format(cls.gs_bucket_name)
@classmethod
- def gs_vsi_prefix(cls, streaming=True):
+ def gs_vsi_prefix(cls, streaming=False):
"""Generate the first part of a VSI path for gdal."""
vsi_magic_string = '/vsicurl_streaming/' if streaming else '/vsicurl/'
return vsi_magic_string + cls.gs_object_url_base()
Then (of course) I run into error writing dirty block
AKA disk full because ref images are huge.
Good news - I have implemented this fix in my forked version of GIPS and it seemed to have fixed my problem. I am going to close the issue.
I think the long(er) term fix is to look at other AOD sources that have lower latency, or to use LTA(D) values, and have a reproc job that comes through and reprocesses data as "real" AOD estimates become available.
Is there currently a gips way to update the LTA composites? I see in the aod driver an exception with "Composite processing is currently broken')".
I've never actually run this so I am not sure how it is invoked.
The composites appear to have been made once (or twice), and then never updated. Then when I found this, I went to update the code to match the architectural changes in GIPS, and ran out of time. Definitely would be worth updating.
We are using gips_fetch to pull Sentinel2 data from google storage. We are running the same command both from an AWS spot instance and from a GCP Kubernetes cluster. We are processing the data for specific tiles using a shapefile that has the tile boundaries. I show below the original call of the gips_fetch from the GCP cluster. Note the parts that say "ERROR 1: Stream too short." It seems to fail on inconsistent bands - in other words if I run this a second time it will fail on different bands and sometimes will get through an entire image (there are 5 images in this tile) without failing. The same fetch command on an AWS spot instance seems to be giving fewer errors but is slower and is showing the same behavior.