aws / aws-sam-cli

CLI tool to build, test, debug, and deploy Serverless applications using AWS SAM
https://aws.amazon.com/serverless/sam/
Apache License 2.0
6.51k stars 1.17k forks source link

Bug: SAM CLI from DevContainer fails either on Windows or Mac #5922

Closed ffMathy closed 11 months ago

ffMathy commented 1 year ago

Description:

When I want to run sam local start-api inside a DevContainer, I encounter different failures depending on what platform my host machine is in, and what configuration I have.

Steps to reproduce:

There are two variants (or scenarios) of configurations I can make. One that fails consistently on Windows, and one that fails consistently on Mac. But I can never have it not fail on both.

Scenario 1

Repro: https://github.com/ffMathy/aws-sam-cli-repro/tree/main (main branch)

Scenario 2

Repro: https://github.com/ffMathy/aws-sam-cli-repro/tree/windows_fix (windows_fix branch)

Diff from Scenario 1: https://github.com/ffMathy/aws-sam-cli-repro/compare/main...windows_fix

Observed result:

Scenario 1 failure (Windows)

MicrosoftTeams-image (6)

Scenario 2 failure (Mac)

2023-09-12 09:40:25,608 | Config file location: /workspace/samconfig.toml                                                                                                              
2023-09-12 09:40:25,672 | Loading configuration values from [default.['local', 'start-api'].parameters] (env.command_name.section) in config file at '/workspace/samconfig.toml'...    
2023-09-12 09:40:25,673 | Configuration values successfully loaded.                                                                                                                    
2023-09-12 09:40:25,673 | Configuration values are: {'stack_name': 'lambda-nodejs18.x', 'warm_containers': 'EAGER'}                                                                    
2023-09-12 09:40:25,843 | Using config file: samconfig.toml, config environment: default                                                                                               
2023-09-12 09:40:25,843 | Expand command line arguments to:                                                                                                                            
2023-09-12 09:40:25,844 | --template_file=/workspace/template.yaml --host=0.0.0.0 --container_host=host.docker.internal --container_host_interface=0.0.0.0                             
--docker_volume_basedir=/Users/dkMaLyLo/Documents/src/aws-sam-cli-repro --port=3000 --static_dir=public --layer_cache_basedir=/root/.aws-sam/layers-pkg --warm_containers=EAGER        
2023-09-12 09:40:25,946 | local start-api command is called                                                                                                                            
2023-09-12 09:40:25,965 | No Parameters detected in the template                                                                                                                       
2023-09-12 09:40:25,974 | There is no customer defined id or cdk path defined for resource HelloWorldFunction, so we will use the resource logical id as the resource id               
2023-09-12 09:40:25,974 | There is no customer defined id or cdk path defined for resource ServerlessRestApi, so we will use the resource logical id as the resource id                
2023-09-12 09:40:25,975 | 0 stacks found in the template                                                                                                                               
2023-09-12 09:40:25,975 | No Parameters detected in the template                                                                                                                       
2023-09-12 09:40:25,981 | There is no customer defined id or cdk path defined for resource HelloWorldFunction, so we will use the resource logical id as the resource id               
2023-09-12 09:40:25,982 | There is no customer defined id or cdk path defined for resource ServerlessRestApi, so we will use the resource logical id as the resource id                
2023-09-12 09:40:25,982 | 2 resources found in the stack                                                                                                                               
2023-09-12 09:40:25,982 | Found Serverless function with name='HelloWorldFunction' and CodeUri='/workspace/hello-world/'                                                               
2023-09-12 09:40:26,028 | watch resource /workspace/template.yaml                                                                                                                      
2023-09-12 09:40:26,029 | Create Observer for resource /workspace/template.yaml with recursive True                                                                                    
2023-09-12 09:40:26,030 | watch resource /workspace/template.yaml's parent /workspace                                                                                                  
2023-09-12 09:40:26,030 | Create Observer for resource /workspace with recursive False                                                                                                 
2023-09-12 09:40:26,092 | Initializing the lambda functions containers.                                                                                                                
2023-09-12 09:40:26,093 | Async execution started                                                                                                                                      
2023-09-12 09:40:26,093 | Invoking function functools.partial(<function InvokeContext._initialize_all_functions_containers.<locals>.initialize_function_container at 0xffffa8047ba0>,  
Function(function_id='HelloWorldFunction', name='HelloWorldFunction', functionname='HelloWorldFunction', runtime='nodejs18.x', memory=None, timeout=3, handler='app.lambdaHandler',    
imageuri=None, packagetype='Zip', imageconfig=None, codeuri='/workspace/hello-world/', environment=None, rolearn=None, layers=[], events={'HelloWorld': {'Type': 'Api', 'Properties':  
{'Path': '/hello', 'Method': 'get', 'RestApiId': 'ServerlessRestApi'}}}, metadata={'SamResourceId': 'HelloWorldFunction'}, inlinecode=None, codesign_config_arn=None,                  
architectures=['x86_64'], function_url_config=None, function_build_info=<FunctionBuildInfo.BuildableZip: ('BuildableZip', 'Regular ZIP function which can be build with SAM CLI')>,    
stack_path='', runtime_management_config=None))                                                                                                                                        
2023-09-12 09:40:26,096 | Waiting for async results                                                                                                                                    
2023-09-12 09:40:26,100 | No environment variables found for function 'HelloWorldFunction'                                                                                             
2023-09-12 09:40:26,101 | Loading AWS credentials from session with profile 'None'                                                                                                     
2023-09-12 09:40:26,276 | Resolving code path. Cwd=/Users/dkMaLyLo/Documents/src/aws-sam-cli-repro, CodeUri=/workspace/hello-world/                                                    
2023-09-12 09:40:26,277 | Resolved absolute path to code is /workspace/hello-world/                                                                                                    
2023-09-12 09:40:26,393 | watch resource /workspace/hello-world/                                                                                                                       
2023-09-12 09:40:26,393 | Create Observer for resource /workspace/hello-world/ with recursive True                                                                                     
2023-09-12 09:40:26,459 | watch resource /workspace/hello-world/'s parent /workspace                                                                                                   
2023-09-12 09:40:26,460 | Code /workspace/hello-world/ is not a zip/jar file                                                                                                           
2023-09-12 09:40:29,986 | Local image is out of date and will be updated to the latest runtime. To skip this, pass in the parameter --skip-pull-image                                  
Building image...........................................................................................................................................................................................................................................................................................................................................................................................................................
2023-09-12 09:41:10,808 | Using local image: public.ecr.aws/lambda/nodejs:18-rapid-x86_64.                                                                                             

2023-09-12 09:41:10,810 | Mounting /workspace/hello-world/ as /var/task:ro,delegated, inside runtime container                                                                         
2023-09-12 09:41:11,120 | Exception raised during the execution                                                                                                                        
2023-09-12 09:41:11,121 | Lambda functions containers initialization failed because of 500 Server Error for                                                                            
http+docker://localhost/v1.35/containers/d55e542097167385c2c5d52cd3651592b12a3ccff5ac62ea4586d39f1cbce068/start: Internal Server Error ("error while creating mount source path        
'/host_mnt/workspace/hello-world': mkdir /host_mnt/workspace: read-only file system")                                                                                                  
2023-09-12 09:41:11,123 | Terminating all running warm containers                                                                                                                      
2023-09-12 09:41:11,123 | Terminate running warm container for Lambda Function 'HelloWorldFunction'                                                                                    
2023-09-12 09:41:11,273 | Cleaning all decompressed code dirs                                                                                                                          
2023-09-12 09:41:11,280 | Telemetry endpoint configured to be https://aws-serverless-tools-telemetry.us-west-2.amazonaws.com/metrics                                                   
2023-09-12 09:41:11,857 | Telemetry endpoint configured to be https://aws-serverless-tools-telemetry.us-west-2.amazonaws.com/metrics                                                   
2023-09-12 09:41:11,857 | Sending Telemetry: {'metrics': [{'commandRun': {'requestId': '228a789b-748b-4bc2-84b8-c7eb8680e85b', 'installationId':                                       
'ac71eba1-d337-48e5-960e-714e9eda4f7e', 'sessionId': 'd9be9397-3c79-42a9-9394-490ec4bc799f', 'executionEnvironment': 'CLI', 'ci': False, 'pyversion': '3.11.2', 'samcliVersion':       
'1.97.0', 'awsProfileProvided': False, 'debugFlagProvided': True, 'region': '', 'commandName': 'sam local start-api', 'metricSpecificAttributes': {'projectType': 'CFN', 'gitOrigin':  
None, 'projectName': '21a3230e03772a58aff1b3709a9e232850916337e1fba95c434076b6668c6e08', 'initialCommit': None}, 'duration': 45436, 'exitReason': 'ContainersInitializationException', 
'exitCode': 1}}]}                                                                                                                                                                      
2023-09-12 09:41:11,858 | Unable to find Click Context for getting session_id.                                                                                                         
2023-09-12 09:41:11,860 | Sending Telemetry: {'metrics': [{'events': {'requestId': '5d1e658a-a542-4ab2-9d04-71573b0bcec7', 'installationId': 'ac71eba1-d337-48e5-960e-714e9eda4f7e',   
'sessionId': 'd9be9397-3c79-42a9-9394-490ec4bc799f', 'executionEnvironment': 'CLI', 'ci': False, 'pyversion': '3.11.2', 'samcliVersion': '1.97.0', 'commandName': 'sam local           
start-api', 'metricSpecificAttributes': {'events': [{'event_name': 'SamConfigFileExtension', 'event_value': '.toml', 'thread_id': '045bb4ff0e3646108ff1d543c4af58c8', 'time_stamp':    
'2023-09-12 09:40:25.607', 'exception_name': None}]}}}]}                                                                                                                               
2023-09-12 09:41:12,616 | HTTPSConnectionPool(host='aws-serverless-tools-telemetry.us-west-2.amazonaws.com', port=443): Read timed out. (read timeout=0.1)                             
2023-09-12 09:41:12,617 | HTTPSConnectionPool(host='aws-serverless-tools-telemetry.us-west-2.amazonaws.com', port=443): Read timed out. (read timeout=0.1)                             
Error: Lambda functions containers initialization failed

Expected result:

I expected the SAM CLI to be launchable from inside Docker. Docker is heavily used for most developers, especially dev-containers.

Additional environment details (Ex: Windows, Mac, Amazon Linux etc)

  1. OS: Windows or Mac
  2. sam --version: SAM CLI, version 1.97.0
  3. AWS region: eu-west-1
{
  "version": "1.97.0",
  "system": {
    "python": "3.11.2",
    "os": "Linux-5.15.49-linuxkit-pr-aarch64-with-glibc2.36"
  },
  "additional_dependencies": {
    "docker_engine": "24.0.5",
    "aws_cdk": "Not available",
    "terraform": "Not available"
  },
  "available_beta_feature_env_vars": [
    "SAM_CLI_BETA_FEATURES",
    "SAM_CLI_BETA_BUILD_PERFORMANCE",
    "SAM_CLI_BETA_TERRAFORM_SUPPORT",
    "SAM_CLI_BETA_RUST_CARGO_LAMBDA"
  ]
}
sriram-mv commented 1 year ago
error while creating mount source path        
'/host_mnt/workspace/hello-world': mkdir /host_mnt/workspace: read-only file system")

Seems to be the culprit. Digging deeper to understand, Is this about trying to do docker in docker?

ffMathy commented 1 year ago

Well, essentially we just want to use the SAM CLI in a devcontainer in some way.

But no matter what we do, it (the SAM CLI or the devcontainer or vscode) crashes/disconnects in some weird manner, one or the other way.

We find it almost impossible to use, and we've spent many weeks of trying out many different options.

I think it's really important that this becomes officially supported and well-tested by the SAM team. Perhaps even documented with an example.

ffMathy commented 1 year ago

I forgot to mention that to repro, you must:

jysheng123 commented 1 year ago

Hi, thanks for specifying the reproducing steps, I was able to run sam local start-api in the dev container using the repo you provided in scenario 2, however I had no issues starting it and invoking it. I also made sure that it was my local files that were edited by verifying changes on them when re starting and invoking. Could you let me know more about how you reproduced it on scenario 2 and more specifically attach the output of sam --info inside the dev container? I have some theories on what may have happened but I need more information about reproducing it to verify them. Thanks

ffMathy commented 1 year ago

Scenario 2 only fails on Mac. Scenario 1 only fails on Windows when Docker is used via WSL2.

Did you use the right OS?

We verified with 2 Windows machines and 2 Mac machines.

ffMathy commented 1 year ago

Also note that the repro link to scenario 1 and 2 are not the same. They are linking to two separate branches.

jysheng123 commented 1 year ago

I have verified using the correct scenario 2 link, I can still run my local server fine with ./start.sh. Just to verify you are failing at that script and not running execute.sh right? Also can you please send the sam --info regardless for the dev container, I am still having issues reproducing it

ffMathy commented 1 year ago

I see. I'll provide the SAM information soon when I'm at my computer.

Start.sh is not enough 🙂 You need to also execute. It's only once it executes that the container disconnects.

ffMathy commented 1 year ago

Here is my sam --info output. I ran it inside the dev-container on my Mac machine:

{
  "version": "1.97.0",
  "system": {
    "python": "3.11.2",
    "os": "Linux-5.15.49-linuxkit-pr-aarch64-with-glibc2.36"
  },
  "additional_dependencies": {
    "docker_engine": "24.0.5",
    "aws_cdk": "Not available",
    "terraform": "Not available"
  },
  "available_beta_feature_env_vars": [
    "SAM_CLI_BETA_FEATURES",
    "SAM_CLI_BETA_BUILD_PERFORMANCE",
    "SAM_CLI_BETA_TERRAFORM_SUPPORT",
    "SAM_CLI_BETA_RUST_CARGO_LAMBDA"
  ]
}
ffMathy commented 1 year ago

My Mac is an M1 Mac by the way, so ARM architecture. Not sure if it matters.

jysheng123 commented 1 year ago

Hi, our team is still having issues reproducing your error on scenario 2 with the windows_fix, host_mnt/workspace is interesting because it is not part of our codebase of all in creating new paths like this. This combined with the fact that we cant replicate it with the same environment points it towards being an issue with docker. A couple google searches from similar issues indicate that it may be a problem with not downloading and updating docker through the official supported sources (https://github.com/moby/moby/issues/34427 and https://stackoverflow.com/questions/45764477/docker-compose-error-while-creating-mount-source-path). Could you try uninstalling re installing docker through the official areas and let us know what happens then? How is docker installed on your machine?

ffMathy commented 1 year ago

I'll investigate tomorrow.

As for scenario 1 on Mac, is that possible to replicate for you?

jysheng123 commented 1 year ago

Scenario 1 on Mac works properly for me, for windows I will get back to you on this, having some issues with my virtual machine for now

ffMathy commented 1 year ago

Ah yes I meant scenario 1 on Windows.

Sounds good!

jysheng123 commented 1 year ago

For Windows, Docker in docker is not working properly for me, as in Docker is not installed in the dev container so I can not verify any of the scenarios on the window machine. Was docker set up correctly for you on your windows machine inside the dev container?

stefanalexandru02 commented 1 year ago

So docker was setup on the windows machine itself, and exposed through WSL2.

In the mac branch, the container connection was somehow broken, while in the windows branch it was fine.

Docker itself was setup correctly, as other contains that were not using sam cli are working fine

jysheng123 commented 1 year ago

Great, did the response on scenario 2 for mac change after fixing docker?

jysheng123 commented 1 year ago

Hey, just pinging to inquire about an update if you have a fix for both of the scenarios after updating the docker. If so, I can close the ticket :).

ffMathy commented 1 year ago

Hey, just pinging to inquire about an update if you have a fix for both of the scenarios after updating the docker. If so, I can close the ticket :).

We will get back with a response tomorrow. Please don't close yet.

Cc @stefanalexandru02 🙏

stefanalexandru02 commented 1 year ago

No, it's still failing with the latest docker version. Same exact devcontainer crash on Windows

lucashuy commented 1 year ago

Hi, sorry about the delay in response. I can reproduce this issue inside of WSL2 (instead of on Windows). We'll need to investigate a bit deeper as to why this happens on Windows, and why the provided fix works on one OS an not the other before we can provide a definitive fix.

This the provided project works in Windows though (not WSL2), so this could be a potential workaround for now while we investigate.

mildaniel commented 12 months ago

I did some investigating here and came to a similar conclusion to @lucashuy.

On Mac, I was able to run commands successfully both with binding the host socket and without. Without the bind it worked out-of-the-box. With the bind, I had to update the file sharing settings on the host machine that were adopted by the dev container.

On Windows, I am able to get SAM CLI working on the dev container when starting VS Code from Windows itself, not WSL. With WSL, I am seeing the same issue being reported. My question is why use WSL as an intermediary here? It seems to me as though the end goal of using the dev container is the same whether using Windows or WSL.

Is there a use-case we're missing that requires you to run dev container with WSL? If not, I think we can resolve this issue.

stefanalexandru02 commented 12 months ago

Main reason is performance. When running docker using WSL backend, it's much slower starting it from Windows compared to WSL, especially for larger projects.

ffMathy commented 12 months ago

Yes, and WSL is also default for everyone. It's what most people use nowadays.

mildaniel commented 12 months ago

Did some more digging today, and we were able to get it to work with WSL. There were a few issues that we needed to resolve:

  1. Since we bind the host Docker service, subsequent containers started by the dev container are run as side-car containers on WSL. This means that the CodeUri property should match the path corresponding to the WSL filesystem, not the dev container filesystem.
  2. When setting "remoteUser": "root", in the .devcontainer.json, this causes an issue with sam build since it will create an .aws-sam directory owned by root, and subsequent commands won't have sufficient permissions. You can either update the user (recommended) or change the permissions of the .aws-sam directory after build.
  3. I'm not entirely sure why, but Docker kept asking for credentials. I needed to remove the existing credential store config located in ~/.docker/config and then login to Docker with docker login.
  4. Run local emulation commands with the --container-host host.docker.internal flag like you have in your example.

There are a lot of levels of virtualization here that complicate things so let us know if these steps help!

ffMathy commented 12 months ago

Thank you for looking into it. Have not yet verified that.

It would be great if there were some documentation on an approach to SAM in WSL, that was verified working on WSL + Mac. It's a maze right now, and it's really hard to get working for both at the same time.

mildaniel commented 11 months ago

We can look into adding some more documentation about development environments and some of the nuances with WSL and dev container.

I am going to remove the bug tag since this is a system configuration issue and nothing we can really do on the SAM CLI side.

mildaniel commented 11 months ago

Resolving for now. Please create a new issue if anything else comes up!

github-actions[bot] commented 11 months ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

ffMathy commented 11 months ago

I must say I am quite disappointed in this. Wouldn't currently recommend the SAM CLI to anyone in its current state. I hope things will improve.

mancinifm commented 8 months ago
3. I'm not entirely sure why, but Docker kept asking for credentials. I needed to remove the existing credential store config located in `~/.docker/config` and then login to Docker with `docker login`.

This resolved my issue. Thanks heaps for your investigation, @mildaniel !

So just to confirm. WSL + DevContainer + sam local start-lambda working here.