Open anayeaye opened 1 year ago
@anayeaye this is the issue we encountered before where the endpoint was failing intermittently. The problem that time was the creds weren't being passed to gdal. This is what that fix looked like: https://github.com/NASA-IMPACT/veda-backend/pull/144/files
The error that we're seeing now:
"detail": "'/vsis3/veda-data-store-staging/EIS/COG/coastal-flooding-and-slr/MODIS_LC_2001_BD_v2.cog.tif' does not exist in the file system, and is not recognized as a supported dataset name."
looks very similar to what we saw in the previous issue.
In the PR we had some changes to how GDAL envs are passed through titiler based on the 0.7.0 breaking changes: https://github.com/developmentseed/titiler/blob/main/CHANGES.md#070-2022-06-08
Do we know if these gdal config changes were tested in dev?
Just for the record (no new insights): I tried some pinning in the raster-api. These changes did not solve our problem and the dev deployment is reverted to the current develop branch.
To be extra sure we weren't getting the breaking version of starlette. This looked promising because there is some subtle difference between the cold start true/false conditions that cause slightly different results on multiple tries of the same request (examples in the issue description).
"fastapi>=0.87,<0.92",
"starlette>=0.21.0,<0.25",
And on a whim, to see if the recent release of rasterio was related to our woes
"rasterio<1.3.8",
So our current condition remains: /cog
routes are happily using the sts assume role session credentials, /mosaic
and /stac
endpoints are not. I don't see where the divergence happens--I'm pretty sure they all have titiler core's BaseTilerFactory
underneath.
mosaic and stac may use another level of threading
which might explain why the environment is not the same. I've been trying to talk this issue for a while without success.
Can you test by setting RIO_TILER_MAX_THREADS=1
and MOSAIC_CONCURRENCY=1
(this in theory will remove any multi-threading)
With RIO_TILER_MAX_THREADS=1
(our current deployment already does) and MOSAIC_CONCURRENCY=1
(just now) Still seeing an Access Denied 403 on the first hit followed by a does not exist in the file system
error on re-tries.
EDIT/note: I've now reverted the lambda environment to match the env variables stored for github actions: RIO_TILER_MAX_THREADS=1
; MOSAIC_CONCURRENCY
is unset.
@vincentsarago: My hacky fix for this issue created a success case for veda-backend
and shows where I believe the issue resides. I wish I would've seen this convo yesterday b/c it would've saved hours 😆
The issue explained:
rasterio.Env
uses thread local which means it inits a new empty context per thread
So by the the time CustomSTACReader
is retrieving the image in this line it's running inside of multiple threads. Each thread's CustomSTACReader
triggers self.ctx (which is just rasterio.Env
) and it has to start all over again with empty context when accessing the image
My fix created a success case b/c I was forcing the env vars back into the rasterio.Env
per-thread context via AssetInfo
Thanks so much @ranchodeluxe for this deep dive. This is definitely a bug that we should fix at rio-tiler level
I wonder if using a combination of https://github.com/rasterio/rasterio/blob/main/rasterio/env.py#L328C1-L339C1 to get the options in the environment and forward them to a new Env will work 🤷
FYI this can be simply demo with
with rasterio.Env(
session=AWSSession(
aws_access_key_id="MyDevseedId",
aws_secret_access_key="MyDevseedKey",
)
):
with rasterio.open("s3://ds-satellite/cogs/NaturalEarth/world_grey.tif") as src:
print(src.profile)
with rasterio.Env():
with rasterio.open("s3://ds-satellite/cogs/NaturalEarth/world_grey_1024_512.tif") as src:
print(src.profile)
{'driver': 'GTiff', 'dtype': 'uint8', 'nodata': None, 'width': 21580, 'height': 10780, 'count': 3, 'crs': CRS.from_epsg(4326), 'transform': Affine(0.01666666666667, 0.0, -179.8333333333333,
0.0, -0.01666666666667, 89.83333333333331), 'blockxsize': 128, 'blockysize': 128, 'tiled': True, 'compress': 'jpeg', 'interleave': 'pixel', 'photometric': 'ycbcr'}
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()
RasterioIOError: Access Denied
ok, I may have a fix
for this but it will require a full rio-tiler/titiler/titiler-pgstac update
I see veda raster-api is a bit behind the actual version (titiler-pgstac=0.2.3 / titiler 0.10.2), ideally I'll release titiler-pgstac 0.5 and titiler 0.12 with a new rio-tiler 4.2
The move from titiler-pgstac 0.2.3 to 0.5 will have couple breaking changes:
# Changes in Item and Collection endpoint URL
# Before
{endpoint}/stac/info?collection=collection1&item=item1
{endpoint}/collections/collection1/items/item1/info
{endpoint}/mosaic/tiles/20200307aC0853900w361030/0/0/0
{endpoint}/mosaic/20200307aC0853900w361030/tiles/0/0/0
- https://github.com/stac-utils/titiler-pgstac/blob/0.4.1/CHANGES.md#040-2023-05-22
/{searchid}/{z}/{x}/{y}/assets
/{searchid}/tiles/{z}/{x}/{y}/assets
- https://github.com/stac-utils/titiler-pgstac/blob/0.4.1/CHANGES.md#041-2023-06-21
rename add_map_viewer to add_viewer option in MosaicTilerFactory for consistency with titiler's options
FYI this can be simply demo with
with rasterio.Env( session=AWSSession( aws_access_key_id="MyDevseedId", aws_secret_access_key="MyDevseedKey", ) ): with rasterio.open("s3://ds-satellite/cogs/NaturalEarth/world_grey.tif") as src: print(src.profile) with rasterio.Env(): with rasterio.open("s3://ds-satellite/cogs/NaturalEarth/world_grey_1024_512.tif") as src: print(src.profile) {'driver': 'GTiff', 'dtype': 'uint8', 'nodata': None, 'width': 21580, 'height': 10780, 'count': 3, 'crs': CRS.from_epsg(4326), 'transform': Affine(0.01666666666667, 0.0, -179.8333333333333, 0.0, -0.01666666666667, 89.83333333333331), 'blockxsize': 128, 'blockysize': 128, 'tiled': True, 'compress': 'jpeg', 'interleave': 'pixel', 'photometric': 'ycbcr'} rasterio/_base.pyx in rasterio._base.DatasetBase.__init__() RasterioIOError: Access Denied
I'm confused as to why rasterio
is operating this way in the same thread. Based on the source code it should be picking these things up!: https://github.com/rasterio/rasterio/blob/main/rasterio/env.py#L272-L291
Even in the same thread it seems the session is not forwarded. I'm opening an issue in rasterio because to me it seems to be a Bug
Even in the same thread it seems the session is not forwarded. I'm opening an issue in rasterio because to me it seems to be a Bug
Yeah, based on the code I'm reading it is a bug
@vincentsarago : For a single thread nested rasterio.Env
DOES find the previous environ. The exact same thing works fine for me below. Not the same s3 endpoint (don't have a DS AWS account). Can you double check that you don't have any existing AWS_*
os environ variables exported and please remove them?
import rasterio
import pprint
session = {
"session": rasterio.session.AWSSession(
aws_access_key_id="<blah>",
aws_secret_access_key="<blah>",
aws_session_token="<blah>",
)
}
with rasterio.Env(**session) as rioenv1:
print('########### rioenv1 ###########')
pprint.pprint(rioenv1.options, indent=4)
with rasterio.open("s3://veda-data-store-staging/geoglam/CropMonitor_202001.tif") as src:
pprint.pprint(src.profile, indent=4)
with rasterio.Env() as rioenv2:
print('########### rioenv2 ###########')
pprint.pprint(rioenv2.options, indent=4)
with rasterio.open("s3://veda-data-store-staging/geoglam/CropMonitor_202001.tif") as src:
pprint.pprint(src.profile, indent=4)
########### rioenv1 ###########
{ 'AWS_ACCESS_KEY_ID': '<blah>',
'AWS_REGION': 'us-east-1',
'AWS_SECRET_ACCESS_KEY': '<blah>'}
{ 'blockxsize': 128,
'blockysize': 128,
'compress': 'jpeg',
'count': 3,
'crs': CRS.from_epsg(4326),
'driver': 'GTiff',
'dtype': 'uint8',
'height': 10780,
'interleave': 'pixel',
'nodata': None,
'photometric': 'ycbcr',
'tiled': True,
'transform': Affine(0.01666666666667, 0.0, -179.8333333333333,
0.0, -0.01666666666667, 89.83333333333331),
'width': 21580}
########### rioenv2 ###########
{}
{ 'blockxsize': 512,
'blockysize': 512,
'compress': 'jpeg',
'count': 3,
'crs': CRS.from_epsg(4326),
'driver': 'GTiff',
'dtype': 'uint8',
'height': 10780,
'interleave': 'pixel',
'nodata': None,
'photometric': 'ycbcr',
'tiled': True,
'transform': Affine(0.01666666666667, 0.0, -179.8333333333333,
0.0, -0.01666666666667, 89.83333333333331),
'width': 21580}
Note: the second call should fails but because I've got my default AWS profile as devseed it works 😅
@ranchodeluxe feel free to add more comments in the rasterio ticket 🙏
@ranchodeluxe feel free to add more comments in the rasterio ticket 🙏
will do, but I have to build my rasterio
image and want to do it as a test case for them
After asking around, it appears this has been resolved for the time being. The ultimate fix is in rasterio, so the next step is bumping rasterio versions once the next release is cut (>1.3.9)
What
We are attempting to promote a large diff from our dev backend to staging but have encountered problems with the maps in the discovery and explore views of a preview of the dashboard running against the development backend. This is a hard error to document because a request for /cog/info that fails on first attempt succeeds when attempted a second time (I've seen that before but don't recall our answer at the moment). I suspect at least some of the solution may be in our raster-api GDAL environment, perhaps the configuration has drifted?
Dashboard preview:
https://deploy-preview-281--visex.netlify.app/
Mosaic examples
Failing mosaic in dev
https://dev-raster.delta-backend.com/mosaic/tiles/795277e64375a264bf3f73506a6cd2d0/WebMercatorQuad/2/0/1@1x?assets=cog_default&resampling=bilinear&bidx=1&colormap_name=rdylbu_r&rescale=0,1
First try:
'/vsis3/veda-data-store-staging/OMSO2PCA-COG/OMSO2PCA_LUT_SCD_2005.tif' does not exist in the file system, and is not recognized as a supported dataset name.
Second attempt after executing /cog/info:
Read or write failed. IReadBlock failed at X offset 0, Y offset 0: /vsis3/veda-data-store-staging/OMSO2PCA-COG/OMSO2PCA_LUT_SCD_2005.tif, band 1: IReadBlock failed at X offset 0, Y offset 0: TIFFReadEncodedTile() failed."
Mosaic works in staging
https://staging-raster.delta-backend.com/mosaic/tiles/795277e64375a264bf3f73506a6cd2d0/WebMercatorQuad/2/0/1@1x?assets=cog_default&resampling=bilinear&bidx=1&colormap_name=rdylbu_r&rescale=0,1
COG info examples
Note we are unable to read COG info for the file for the mosaic but can access other files in the same collection so it is not purely a permission issue
https://dev-raster.delta-backend.com/cog/info?url=s3://veda-data-store-staging/OMSO2PCA-COG/OMSO2PCA_LUT_SCD_2005.tif
On first attempt:
`'/vsis3/veda-data-store-staging/OMSO2PCA-COG/OMSO2PCA_LUT_SCD_2005.tif' does not exist in the file system, and is not recognized as a supported dataset name.
On second attempt: endpoint returns cog/info
These are yearly COGs so the error should be reproducible by incrementing the date in the tif name.
COG Tiles example
We already know that /cog is handling the env, this tiles example works as expected. https://dev-raster.delta-backend.com/cog/tiles/WebMercatorQuad/0/0/0@1x?url=s3://veda-data-store-staging/OMSO2PCA-COG/OMSO2PCA_LUT_SCD_2005.tif&bidx=1&rescale=0,1
Stack Notes
We cannot make a one to one comparison of the dev and staging veda-backend stacks because we have upgraded the version of pgstac for the dev database but not staging.
Similarities
Differences