Open yskkin opened 2 years ago
@yskkin Not sure I follow what you are proposing here. The output states "skipping upload" so not sure what is indicating unnecessary uploads.
https://github.com/aws/aws-sam-cli/blob/bfbcc166709dba330310088e5a5e520da1ef2444/samcli/lib/package/s3_uploader.py#L87-L89
This file_exists
check take few seconds for each lambda and slows down build if deployed deduped lambda > ~10.
Since we know build artifact is identical, I thought even file_exists
check can be skipped.
Currently facing the same issue, SAM stack is taking sooooo long to get deployed, and I'm seeing a bunch of these messages being triggered.
Maybe these checks could be done in parallel or something similar?
So currently we check against the source (s3) to ensure the artifacts are indeed there. So as it is today, we only know the file already exists by calling s3.head_object
(which shouldn't take much time at all) (https://github.com/aws/aws-sam-cli/blob/bfbcc166709dba330310088e5a5e520da1ef2444/samcli/lib/package/s3_uploader.py#L196).
We could probably optimize this further somehow by caching some additional info.
Maybe these checks could be done in parallel or something similar?
Parallel probably isn't the best idea, as you can run into race conditions in uploading and checking if the file exists.
Will update the labels. We can look at a PR in anyone is interested here but not something we are currently on our Roadmap.
If you are interested in submitting a fix, we will want to make sure however this is cached is reproducible and avoids not uploading something we should have.
Thanks for looking into this guys. This is a real blocker for us, as it's taking nearly 50 minutes to deploy the stack at times... Any way this can be prioritized?
We'll have to switch over to CDK otherwise I'm afraid, as the release times are just too long.
Thanks again
@konarskis If this is blocking you, you are free to submit a PR to patch.
I started looking into this issue because I was experiencing the same thing @konarskis was. It turns out it's not the head_object
check that's taking so long – it's the fact that sam build
puts each function into its own directory within .aws-sam/build
, and then sam deploy
zips up each of those directories in turn and then checks to see if the resulting file already exists in s3.
So for two functions, functionA
and functionB
, with the same CodeUri
and Runtime
, it has to:
.aws-sam/build/functionA
and get the md5 of the result.aws-sam/build/functionB
and get the md5 of the resultfunctionA
), so skip itIt's Step 4 that's the time consuming one for each function, not Step 5.
The solution would seem to be a more optimized build process where the target directory is .aws-sam/[artifact_hash]
, where artifact_hash
is the md5 hash of the function's CodeUri + Runtime
. That would allow some deduping of the build as well as the upload. But it's a pretty major change to how things work, and I don't know the codebase well enough to predict what kind of knock-on effects it might produce.
@jfuss, if you think this approach is worth pursuing, I'd be willing to take a run at it, but I didn't want to forge ahead without some idea that such a refactor would be welcome/considered.
I guess a less invasive version of that would be to create a dict that maps each function's resource ID to a hash of their CodeUri
+ Runtime
and use that to decide which ones to upload. Maybe I'll give that a go and submit a PR.
Hey all, just checking in to see if there's been any progress or workarounds for this. As our code grows, we've been experiencing 25min waits for these checks and deployment. Thank you very much!
I started working on the idea I posted about two comments up and never got anywhere with it. (I can't remember now if it was a technical issue or just me having to reprioritize.)
This dominates our deploy time (30 minutes+ at present), would be a big help to improve this time.
Describe your idea/feature/enhancement
I
sam deploy
-ed 17 lambdas which sharesCodeUri
andRuntime
(thus deduped build is enabled). Got following output:This output implies
sam deploy
unnecessarily uploads identical artifact.Proposal
Ignore identical upload. It takes few seconds for each skipped upload. Ignoring identical upload can improve performance of
sam deploy
.Additional Details