Open alexgleith opened 4 years ago
So this didn't help. I have confirmed that GDAL performs minimal number of requests with those settings, but I suspect either GDAL or rasterio is trying to obtain IAM credentials despite unsigned configuration (AWS_NO_SIGN_REQUEST=YES
). We had similar issue in OWS recently.
from datacube.utils.aws import configure_s3_access
configure_s3_access(aws_unsigned=True, cloud_defaults=True)
calling above does make a difference even though the same configuration is already supplied via environment variables in the sandbox.
rasterio
is confirmed as culprit for the slow down, it simply doesn't check environment for presence/value of AWS_NO_SIGN_REQUEST
and attempts to obtain credential from iam-role
which times out (slowly).
import logging
logger = logging.getLogger('botocore')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler()) # Writes to console
Every file open then causes this logging output:
Looking for credentials via: env
Looking for credentials via: assume-role
Looking for credentials via: assume-role-with-web-identity
Looking for credentials via: shared-credentials-file
Looking for credentials via: custom-process
Looking for credentials via: config-file
Looking for credentials via: ec2-credentials-file
Looking for credentials via: boto-config
Looking for credentials via: container-role
Looking for credentials via: iam-role
Caught retryable HTTP exception while making metadata service request to http://169.254.169.254/latest/api/token: Connect timeout on endpoint URL: "http://169.254.169.254/latest/api/token"
Traceback (most recent call last):
File "/env/lib/python3.6/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/env/lib/python3.6/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/env/lib/python3.6/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
socket.timeout: timed out
One work around is to inject fake credentials via environment variables, this would make credential acquisition quick, as botocore will not go looking for STS. So adding something like:
AWS_ACCESS_KEY_ID=fake
AWS_SECRET_ACCESS_KEY=fake
feels dirty but should work.
Is your feature request related to a problem? Please describe. Datacube load is slow for some products, like the ls8 geomedian.
Describe the solution you'd like Setting these environment variables makes it faster:
Describe alternatives you've considered Adding
__init__.py
files in the notebook environment.Additional context n/a