Closed mpszumowski closed 12 months ago
The same workaround, just moved to top-level, compatible with running locally, and creating no duplicate entries in sys.path
:
import pkg_resources
import sys
site_packages = "/var/lang/lib/python3.8/site-packages"
try:
sys.path.remove(site_packages)
except ValueError:
pass
else:
sys.path.insert(0, site_packages)
for dist in pkg_resources.find_distributions(site_packages, True):
pkg_resources.working_set.add(dist, site_packages, False, replace=True)
This behavior is terrible, I am surprised this hasn't been fixed. If you install your own boto3/botocore3 and any other library that is shadowed by /var/runtime
you will think you are running on your pinned requirement but nope, you completely rely on the shadowed version. Production ticking bomb. Even worse if you haven't pinned on a dated image tag and your image is cached; your dependencies slowly grow out of date without you knowing.
There is also not much control over this, removing syspath being a rather hacky solution in my view.
Does anyone know if those existing boto3/botocore libraries shipped with the lambda are actually used by the runtime? Could we just wipe them from our Dockerfile?
What happens is that Lambda Runtime sets the /var/runtime directory in front of /var/lang/lib/python3.7 and populates the pkg_resources.WorkingSet with the distributions installed there (mostly boto3 + deps).
Where does this happen? I mean what component is actually doing this?
Just thinking of making my own docker base image to use for lambda's, but I'd like to be sure that doing so will actually fix this issue.
Where does this happen? I mean what component is actually doing this?
The lambda runtime script runs in /var/runtime which is in the same directory as boto3. From there it creates the lambda listener and at some point does an import of the lambda handler function and to finally run your lambda code upon event receive. The fact it's in the same directory means Python will give priority to import in the same directory. If Python doesn’t find the module in the local directory, it’ll then move onto the paths specified in $PYTHONPATH
Oh I see, so this behaviour is more a bi-product of how it is run, and not something done on purpose.
Looking at https://github.com/aws/aws-lambda-python-runtime-interface-client I cannot see any dependency documented on boto3 or botocore. That doesn't of course mean that it doesn't just rely on those libraries already being available though I guess.
Maybe making our own base images, that don't have boto etc installed alongside the lambda runtime interface client will help in this situation.
Looking at the documentation here, it doesn't suggest that boto3 etc are required dependencies either.
https://docs.aws.amazon.com/lambda/latest/dg/images-create.html#images-create-from-alt
Thanks @SteggyLeggy, based on your suggestion I followed "Using an AWS base image for custom runtimes" from here: https://docs.aws.amazon.com/lambda/latest/dg/images-create.html#runtimes-images-custom which worked great.
We have published an updated image for Python 3.11 which addresses this issue.
Previously, the Lambda base container images for Python included the /var/runtime
directory before the /var/lang/lib/python3.x
directory in the search path. This meant that packages in /var/runtime
are loaded in preference to packages pip installed into /var/lang/lib/python3.x
. Since the AWS SDK for Python (boto3/botocore) was installed into /var/runtime
, this made it harder for customers to upgrade the SDK version.
With the Python 3.11 runtime, the AWS SDK and its dependencies are now pre-installed into the /var/lang/lib/python3.11
directory, and the search path has been modified so this directory has precedence over /var/runtime
. Customers can override the SDK by pip installing a newer version. This change also enables pip to verify and track that the pre-installed SDK and its dependencies are compatible with any customer-installed packages.
Will this fix also be backported to the older lambda python base images that are still supported? I think it should be backported to all the image versions listed here: https://docs.aws.amazon.com/lambda/latest/dg/python-image.html#python-image-base
@ryancausey We don't currently plan to back-port this change to the Lambda images for earlier Python versions. It's a (potentially) breaking change and we don't want to break existing customer configurations.
Image used:
amazon/aws-lambda-python:3.7
Dockerfile:
requirements.txt
lambda_function.py
What I would expect: Dependencies are installed correctly, the
lambda_handler
imports them and executes properly on Lambda.What is the case: Log from Lambda:
The
snowflake-connector-python
imports a different version of its dependency than pip has installed during Docker build. It then fails due to the fact thatidna 3.1
library does not match its requirements:idna<3,>=2.5
.Why it tries to import a different version is suggested by the logging I have added in the Dockerfile and the lambda_function.
Dockerfile:
RUN python -c 'import sys; print(sys.path)
'lambda_function.lambda_handler:
print(sys.path)
Dockerfile:
RUN pip3 freeze | grep idna
'idna==2.10
lambda_function.lambda_handler:
print(pkg_resources.working_set.by_key['idna'])
idna 3.1
What happens is that Lambda Runtime sets the
/var/runtime
directory in front of/var/lang/lib/python3.7
and populates the pkg_resources.WorkingSet with the distributions installed there (mostly boto3 + deps). This is being carried over to the lambda handler which is executed with the "overriden" libraries. Seeing howsys.path
at the moment when the handler executes, I presume that it has been manually modified to not place the runtime path at the beginning, but the user provided libraries in/var/task
. Why/opt/python/lib/python3.7/site-packages
is the second if it is not the pip site-packages directory?The outcome is really confusing - using Docker I expect to be able to handle my runtime and (at least) my dependencies. I definitely expect the Lambda Runtime to be transparent and its dependencies not to impact my workload. Especially if the bug is this opaque and undocumented.
I was able to work my way around with the following hack.
It may be dangerous if the manually imported libraries libraries will in turn conflict with the downstream code in the runtime. I think, however, that something of this kind can be implemented in the runtime itself so the the handler use only the environment libraries.