Closed jamesoff closed 1 year ago
same issue here...
I think the issue may be with the -L
option here. (Would need to test to confirm).
https://github.com/aws/aws-cdk/blob/e5be10049047d29e9e687f5f4f39037275d51d38/packages/@aws-cdk/aws-lambda-python/lib/bundling.ts#L96
Since we are running this in a docker container and we have only entry directory mounted inside the container, it probably doesn't make sense to try to copy the symlink.
@corymhall thanks for updating this issue. The main reason we use the .venv folder inside the source directory is because it makes it very easy to set the interpreter for debugging python in visual studio code.
Just chiming in to say I also have an issue with the build process including the .venv
. The way it manifests for my project is slightly different- it doubles my synthesized asset bundle size by copying .venv
(and subsequently all installed packages). I believe the same root problem is causing both issues, so I didn't create a new Issue however.
My setup is the same, using poetry config virtualenvs.in-project true
so I can use VSCode with any environment controlled by the repo itself (only the lambda codebase runs 3.9, other projects run 3.10+, etc).
├── asset.f259b5b5d475e70bd0aba226d3d5af04c09dea9e8728664e7a17437581997a1a
│ └── python
│ ├── .DS_Store
│ ├── .dockerignore
│ ├── .gitignore
│ ├── .venv # <-- very large wasted space, 2x asset bundle size
│ ├── README.md
│ ├── __pycache__ # <-- also probably wasted space, see _Note_ at the end of this comment
│ ├── anyio
│ ├── anyio-3.6.2.dist-info
I believe a "most user friendly" solution that would solve both of these two issues would be to provide an exclusion list at the Construct level. The only problem with that is cp
doesn't have a clean means to do that, it would have to switch to rsync
to do it "correctly". With that list of excludes we can just append --exclude foo
to rsync
.
I noticed that pip already does vaguely similar stuff inline but when I tried to do that with Poetry it actually just removed my source .venv
🤦
https://github.com/aws/aws-cdk/blob/main/packages/%40aws-cdk/aws-lambda-python/lib/packaging.ts#L54
exportCommand: `PIPENV_VENV_IN_PROJECT=1 pipenv lock -r > ${DependenciesFile.PIP} && rm -rf .venv`,
and I was able to "fix" (re: hack) my problem by altering the locally installed node_modules/@aws-cdk/aws-lambda-python-alpha/lib/bundling.js
with a manual .venv
exclusion using rsync
as follows:
// from: node_modules/@aws-cdk/aws-lambda-python-alpha/lib/bundling.js#createBundlingCommand
// use rsync instead of cp to provide exclusion list
console.log(`<!!Manual hack!!>: "rsync -av --exclude '.venv' -r ${options.inputDir}/ ${options.outputDir}"`);
bundlingCommands.push(`rsync -v --exclude '.venv' -r ${options.inputDir}/ ${options.outputDir}`);
// bundlingCommands.push(`cp -rTL ${options.inputDir}/ ${options.outputDir}`);
I'd be happy to throw a PR together if that helps, I just haven't contributed yet so it'll be a bit before I get set up.
Proposed fix: Changing from cp
to rsync
allows us to provide a list of exclusions, and we can put .venv
in there to circumvent both of these issues entirely.
Note Separately, I realized that if bundling
is provided to the Code.fromAsset
, it ignores the exclude
parameter entirely (per comments in core/lib/fs/options.ts so I'm seeing lots of *.pyc
files copied over from __pyache__
etc as well, in my output
The issue still persists in 2.50.0. Any updates? :thinking:
@ryanandonian - nice approach
// from: node_modules/@aws-cdk/aws-lambda-python-alpha/lib/bundling.js#createBundlingCommand // use rsync instead of cp to provide exclusion list console.log(`<!!Manual hack!!>: "rsync -av --exclude '.venv' -r ${options.inputDir}/ ${options.outputDir}"`); bundlingCommands.push(`rsync -v --exclude '.venv' -r ${options.inputDir}/ ${options.outputDir}`); // bundlingCommands.push(`cp -rTL ${options.inputDir}/ ${options.outputDir}`);
So best way would be an additional property to exclude directories, right ?
So best way would be an additional property to exclude directories, right ?
To me that feels like a reasonable solution that keeps existing behavior as to not break clients, but also lets users customize it without too much mental overhead/complexity. I believe this will also fix @jamesoff 's original issue, since the bundler will just completely disregard .venv
entirely, there'll be no symlinks to miss ;)
The only two minor concerns that I have with this as a long term are:
rsync
will be available to the images moving forward since it's not part of the default executables in some images-a
) for this very reason, but I cannot assert it won't be an issue in a production environment when using an NTFS source volume.Besides that, I have already "tested" this flow locally on OSX and it worked fairly cleanly with deployments to real production environments. (note: in the snippet I pasted, I didn't use -a
but in practice I did use -a
)
I found a workaround for this issue by adjusting the directory structure.
Pointing entry to a folder without a venv fixes the problem. So I just placed the poetry package one folder deeper into the folder structure, while keeping the venv on the original folder level. Hereby you retain the use of a local venv for IDE functionality. Only downside is that you have to navigate through one more folder in your IDE.
The directory structure that is giving me the error:
.
└── test_lambda
├── .venv
├── README.md
├── index.py
├── poetry.lock
└── pyproject.toml
The directory structure that solved the error:
.
└── test_lambda
├── .venv
test_lambda
├── README.md
├── index.py
├── poetry.lock
└── pyproject.toml
Hey - any update on this topic ?
Hey @pharindoko , I finally got around to getting some time to work on submitting what I feel is a proper fix for our issues. I opened #23670 , so hopefully I was able to align with the contributing guidelines and can get this fix approved by the team.
Thanks @ryanandonian Have seen your PR and like this approach.
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Describe the bug
If a
.venv
directory is present in the function source directory (i.e. the value ofentry
toPythonFunction
), the asset bundling fails after apparently trying to find the Python binary inside it. This presumably occurs because in my virtualenv,bin/python
is a symlink to the original Python used as part of the virtualenv creation.I often create a virtualenv in my function's directory to allow for code completion etc while working on the code.
I don't believe this used to happen; this project was successfully deployed on 2.35.0a0 and fails with 2.{46,47}.0a0. I have not bisected the library versions to verify.
Expected Behavior
cdk synth
would succeed.Current Behavior
Reproduction Steps
CDK code:
In the
lambda/
directory,Then
synth
.Possible Solution
I was hoping there would be an option to exclude files/directories from the build but couldn't see one in the docs.
Or, CDK should not attempt to use the existing virtualenv during the build, or emit a warning if one is found, which would have aided my troubleshooting.
Additional Information/Context
No response
CDK CLI Version
2.46.0 (build 5a0595e)
Framework Version
No response
Node.js Version
v18.10.0
OS
macOS 12.6
Language
Python
Language Version
Python 3.10.4
Other information
No response