aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.51k stars 3.86k forks source link

Integ-runner: Breaks the CWD and path when using Python language #29196

Open tomharvey opened 6 months ago

tomharvey commented 6 months ago

Describe the bug

After test file discovery integ-runner changes the working directory from the users's cwd (usually the project root) to the test file's parent directory and then runs the cdk command, which (when using Python) means that the cwd and sys.path are in an unexpected state.

This means that we cannot import Stacks or Constructs from directories in the project which are above or adjacent to the test files. Since this is the expected location of Stacks and Constructs - that makes integ-runner unusable in python.

Given a CDK project of file layout like the below:

my-cdk-app/
├─ my_cdk_app/
│  ├─ my_stack.py
│  ├─ lambda/
│  │ ├─ index.py
├─ test/
│  ├─ integ_.my_stack.py
├─ app.py
├─ cdk.json

While in app.py it's expected that we would from my_cdk_app.my_stack import MyStack, within inter_.my_stack.py we are unable to do so.

Since the working directory has changed before we spawn the cdk and python subprocesses the sys.path doesn't contain the root of the project and we get a ModuleNotFoundError

Secondly, if my_stack.py contains a relative path definition, like for a Lambda function Code.from_asset(f"{os.get_cwd()} /my_cdk_app/lambda"). Then the relative path would break as you're now working from a base path of /test instead of the root path / of the project. And, so the asset cannot be found at /test/my_cdk_app/lambda since it's at /my_cdk_app/lambda.

Expected Behavior

I expect the cdk command and the integ-runner commands to run in the same way - whereby the current working directory is the directory that I run the command from.

It's very common to expect that the cwd is the directory that the user initiates a script from. And the script is expected, by user, to be integ-runner, not integ_.my_stack.py.

Current Behavior

The integ-runner command runs the cdk command after changing directories. Therefore the cwd is inconsistent and unexpected and results in ModuleNotFoundErrors and missing assets for lambda, Dockerfiles, S3Deployments and other stacks which use a path to define the location of assets.

Reproduction Steps

Create a CDK project:

mkdir my_cdk_app
cd my_cdk_app
cdk init --language python

Add an integ-runner test:

mkdir test
touch test/integ_.my_cdk_app.py

And populate that test file with:

# test/integ_.my_cdk_app.py

import aws_cdk as cdk
from aws_cdk import (
    integ_tests_alpha,
)

from my_cdk_app.my_cdk_app_stack import MyCdkAppStack

app = cdk.App()
stack = MyCdkAppStack(app, "TestStack")

integration_test = integ_tests_alpha.IntegTest(app, "Integ", test_cases=[stack])

synth = app.synth()

Then, when you run integ-runner you will get the ModuleNotFoundError:

Verifying integration test snapshots...

Traceback (most recent call last):
  File "/workspaces/my_cdk_app/test/integ_.my_cdk_app.py", line 6, in <module>
    from my_cdk_app.my_cdk_app_stack import MyCdkAppStack
ModuleNotFoundError: No module named 'my_cdk_app'

While cdk synth will successfully import from my_cdk_app.my_cdk_app_stack import MyCdkAppStack and synthesis the stack.

Possible Solution

To fix the behaviour in a hacky way, we can edit the test file, adding this to the top:

import sys
sys.path.insert(0, project_root_path)

Where project_root_path is the absolute path within your developer machine to the project root - in my case "/workspaces/my_cdk_app".

After this, it will successfully run.

When I need to fix the cwd to make Code.from_asset work as expected I need to also add:

import os
os.chdir(project_root_path)

This is very hacky, and as the project might live in different locations between different developers, it's very brittle.

So, I made https://pypi.org/project/cdk-integ-runner-cwd-fix/ to provide a less brittle, but still hacky fix.

Using that I need to add to the top of each test file:

from cdk_integ_runner_cwd_fix import fix_cwd
fix_cwd()

And then run inter-runner while providing the project root path as an environment variable:

CDK_INTEG_RUNNER_CWD=$(pwd) integ-runner

Doing so, shows no (so far) problems with the operation of integ-runner, so it would be much preferred if integ-runner didn't introduce this unexpected change in working directory in the first place, and we didn't need to hack the path and CWD at all.

You could introduce a --cwd option to integ-runner which would incorporate a similar approach. Or, untangle all of the chdir that happens between test file discovery and test file being run.

Additional Information/Context

integ-runner --version
>>> 2.128.0-alpha.0

CDK CLI Version

2.122.0 (build 7e77e02)

Framework Version

No response

Node.js Version

v20.11.0

OS

debian bullseye

Language

Python

Language Version

Python 3.12.0

Other information

No response

tomharvey commented 6 months ago

I spoke with @mrgrain on the CDK Slack channel about this - https://cdk-dev.slack.com/archives/C018XT6REKT/p1708393917643359

Here is the conversation for posterity 🏴󠁧󠁢󠁳󠁣󠁴󠁿


Tom Harvey 1 day ago I seem to have an outstanding problem when I try to import my Stack from another python file. ModuleNotFoundError: No module named 'python_aws_cdk_integration_test_playground' I can have a poke around the path and see. But if you know anything on that, it would be great

mrgrain 1 day ago Try running the test directly, might make debugging easier: python integ_hello.py

Tom Harvey 1 day ago Yes. yes… integ-runner is doing all kinds of strange to the path I can hack the path into the test file, but now the reference to Code.from_asset("./lambda") is scoped to within the /test dir … continues to dig.

Tom Harvey 1 day ago integ-runner is making the cwd the /test directory. Which isn’t fun, and resulting in a lot of ugly path manipulation in my code. Is that expected behaviour, or something that I can/should feedback on (without starting that old conversation about feedback going into a black hole)

Tom Harvey 1 day ago Haha - hello - I grew up in Glasgow!

mrgrain 1 day ago Ha! Small world :smile:

Tom Harvey 1 day ago I feel like adding

os.chdir(package_path)
sys.path.insert(0, package_path)

To the top of my test file and, then reference the lambda code with: Code.from_asset(f"{os.getcwd()}/lambda") Is the “least bad option” But would love feedback on that. Will blog it and ask for wider opinions.

mrgrain 1 day ago I think integ-runner is setting the CWD to the value of --directory - which defaults to test

Tom Harvey 1 day ago Yeah. I tried setting that in the config.json and with integ-runner --directory "." but it always seems to set the cwd to the directory that the test file lives in. I’m using Code.from_asset(f"{os.getcwd()}/lambda") for the asset code And, I’m not hacking the os.cwd in the test file: When the test file lives in test/integration/integ.hello.py I get Cannot find asset at /workspaces/python-aws-cdk-integration-test-playground/test/integration/lambda And when it lives in the simpler test/integ.hello.py I get Cannot find asset at /workspaces/python-aws-cdk-integration-test-playground/test/integration/lambda So, we can see that os.getcwd() is dependant on the location of the test file (not the location where I run integ-runner from or the --dir flag. Once I add os.chdir("/workspaces/python-aws-cdk-integration-test-playground") back into the integ_.hello.py file, all is green again

mrgrain 1 day ago integ-runner started out as a testing tool for reusable construct libraries. Those have probably quite different prerequisites for things like paths. In my example you might have noticed that I'm using a locally build python modules. Anyway, that's also the reason why it's still in developer preview, because we need to suss out these kind of quirks. Feedback is very much appreciated here. Just don't expect a quick act on it since this particular one will have ramifications. We tend to collect these kind of issues until product decides to move ahead with GA and then they will be addressed in a wider context.

Tom Harvey 1 day ago I don’t need quick feedback. Just to know it’s gone to the right place :slightly_smiling_face:

mrgrain 1 day ago Yeah. I tried setting that in the config.json and with integ-runner --directory "." but it always seems to set the cwd to the directory that the test file lives in. Oh so it's dirname(testfile) ? That's also possible.

Tom Harvey 1 day ago Patching the test file to set the cwd to the location I run the integ-runner script from (you know - what I’d expect it to use as CWD :slightly_smiling_face: ) doesn’t seem to break anything (so far) and does help things a lot.

mrgrain 1 day ago I’m using Code.from_asset(f"{os.getcwd()}/lambda") for the asset code Perfect example of the difference between app and lib. In a library this would be a bad idea. In an app it's totally fine to do.

Tom Harvey 1 day ago started out as a testing tool for reusable construct libraries Thanks, That’s great context to keep in mind, and I can see how that would change things,.

Tom Harvey 1 day ago :slightly_smiling_face: yip

mrgrain 1 day ago Patching the test file to set the cwd to the location I run the integ-runner script from (you know - what I’d expect it to use as CWD :slightly_smiling_face: ) doesn’t seem to break anything (so far) and does help things a lot. So maybe we need a --cwd option for integ-runner?

Tom Harvey 1 day ago I’m unsure why it wouldn’t default to the location of where I run the script from. But, yeah

Tom Harvey 1 day ago (the script being integ-runner as opposed to integ_.hello.py)

mrgrain 1 day ago I'm unsure about that either! Would have to dive into the code to see if there's a reason.

Tom Harvey 1 day ago then again, I now see now how when running deploy, is cwd actually the location of app.py or the location I run cdk from

Tom Harvey 1 day ago might need to test that … what does cdk consider “the script” app.py or cdk?

Tom Harvey 1 day ago likely the same approach for integ-runner vs integ_.hello.py

mrgrain 1 day ago Ha, excellent question!

mrgrain 1 day ago cdk synth is really just a shortcut for CDK_OUTDIR=cdk.out python app.py

Tom Harvey 1 day ago Yip - so the cwd is the location of app.py, not the place I run cdk from :+1:

Tom Harvey 1 day ago … I think… will check

mrgrain 1 day ago But what is the cwd if you run CDK_OUTDIR=cdk.out python subdir/app.py ? :thinking_face:

Tom Harvey 1 day ago python test/integration/foo.py where foo.py is: import os

print(os.getcwd()) results in /workspaces/python-aws-cdk-integration-test-playground (which is my user CWD, not the location of the foo.py :+1::skin-tone-2: 1

Tom Harvey 1 day ago But what is the cwd if you run CDK_OUTDIR=cdk.out python subdir/app.py ? It doesn’t change from that of python app.py

Tom Harvey 1 day ago it remains the location from where I run the script

Tom Harvey 1 day ago cd subdir/ python app.py

/workspaces/python-aws-cdk-integration-test-playground/subdir So, moving into subdir does what I would expect, change the cwd to the subdir (I added print(os.getcwd()) to the top of app.py)

Tom Harvey 1 day ago So i guess integ-runner runs a little different

mrgrain 1 day ago :+1::skin-tone-2: That's definitely something we should consider aligning.

Tom Harvey 1 day ago Very curious to see how it does that. I’m guessing it’s spawing a python subprocess and doing so from the test file’s directory

mrgrain 1 day ago It's spawning a subprocess to run the cdk cli which is than spawning a subprocess to run python. From the sounds of it, the cdk subprocess is spawned with the dirname of the file as cwd.

Tom Harvey 1 day ago Would be good to know why. Was that a purposeful decision?

mrgrain 1 day ago Ha, it's actually the directory of the snapshot directory. Which happens to be co-located to the file. Or not. That's just some other cwd :thinking_face: (edited)

mrgrain 1 day ago Doesn't appear to be purposeful

Tom Harvey 1 day ago Yeah, I’ve been hunting through the codebase. Doesn’t look like a good reason in there, but slightly tangled to undo that decision.

Tom Harvey 1 day ago Thanks for your help so far. I’ll see how I can hack around it for now, with a hope that the hack can be removed sometime Probably set some ENV VAR for the cwd I want and then set that at the top of all tests

Tom Harvey 1 day ago … and that’s going to need to be a package that’s installed so it can be on my path. Gah. Messing with paths is a really big pain :slightly_smiling_face:

Tom Harvey 1 day ago Is it a good/bad idea to dump this conversation into an issue on the repo? Is there a formal process to log these things?

mrgrain 1 day ago Do it!

Tom Harvey 1 day ago Will do… Been busy :slightly_smiling_face: pip install cdk-integ-runner-cwd-fix Now lets me add this to the code, regardless of where the test file is: from cdk_integ_runner_cwd_fix import fix_cwd

fix_cwd() # noqa Then, bu setting export CDK_INTEG_RUNNER_CWD=$(pwd) I can run integ-runner all A-OK :raised_hands::skin-tone-2: 1

Tom Harvey 1 day ago https://github.com/tomharvey/cdk-integ-runner-cwd-fix

GitHubGitHub GitHub - tomharvey/cdk-integ-runner-cwd-fix: Fix the problem with CDK's integ-runner breaking the python path and current working directory Fix the problem with CDK's integ-runner breaking the python path and current working directory - tomharvey/cdk-integ-runner-cwd-fix