Closed garretwilson closed 1 year ago
@garretwilson Can you provide a workable example. The details you have describe what is happening but without something I can actually validate with, it's hard to repro (effectively just guessing at your template and setup).
Can you provide a workable example.
I can give you basically everything I have. Here is basically what my template is:
AWSTemplateFormatVersion: '2010-09-09'
Transform:
- AWS::LanguageExtensions
- AWS::Serverless-2016-10-31
Description:
Example functions.
Parameters:
Ver:
Description: The version of the service, such as "1.2.3".
Type: String
AllowedPattern: '[\w.+-]+'
ConstraintDescription: The service version must use only word characters, dots, plus signs, and dashes.
Globals:
Function:
CodeUri: !Sub "s3://my-sam-bucket/foo-bar-functions-${Ver}.jar"
Runtime: java17
Architectures: [x86_64]
MemorySize: 512
Timeout: 300
Resources:
ExampleFunction:
Type: AWS::Serverless::Function
Properties:
Handler: com.example.FooBar::foo
Here is my samconfig.toml
:
version = 0.1
[default]
[default.global.parameters]
stack_name = "placeholder" # will be specified from the CLI
[default.build.parameters]
cached = true
parallel = true
[default.validate.parameters]
lint = true
[default.deploy.parameters]
capabilities = "CAPABILITY_IAM"
confirm_changeset = true
[default.package.parameters]
[default.sync.parameters]
watch = true
[default.local_start_api.parameters]
warm_containers = "EAGER"
[default.local_start_lambda.parameters]
warm_containers = "EAGER"
I'm afraid I don't have time to create some turnkey Zip package. I'm already delayed in this project; I haven't even got to the point of deploying actual functionality, as I spent so much time just getting the deployment working (#5249). 😅 Really there isn't much here in my configuration and code; I can't imagine it would be hard to reproduce.
If it's not happening for you, the first place I would look would be the !Ref
and AWS::LanguageExtensions
interactions. It sounds like something to do with creating a hash before or after variable substitution.
You might check into line endings as well. I'm on a Windows machine using CRLF in the templates. If your hashing function converts to LF when doing the upload and then compares with that in the local directory without normalizing line endings, then that could lead to mismatched hashes as well. Just giving you ideas to look into.
@garretwilson Thanks that is enough. I don't need a full example, I can fill in the details but with nothing to base on you are leaving us guessing, which leads to "works on my machine can't repro". Providing clear and concise issues with examples is the key to making interactions faster with the team.
Sure, I understand. My configuration is so simple I would guess it will be super-easy to reproduce. But if you can't, just let me know and I'll think more about what further could be different between our environments. A step at a time I can put in more work for turn-key reproducibility if needed, but at this point I'm guessing it won't be needed. Good luck and have a good week.
I am not able to repro this on 1.85.0, using various methods. The likely place this could fail is https://github.com/aws/aws-sam-cli/blob/develop/samcli/commands/delete/delete_context.py#L273 as we try to compute the hash from the downloaded template but without being able to repro, it's hard to say for certain.
OK. I'll put together something even more turnkey, although it might be a few days. (Fortunately this particular is nowhere near a blocker; I'm just reporting it to help tidy up loose ends.)
In the meantime, before I put a lot of time into this, can you just confirm that you have tested it from the command line on Windows 10?
I am not on Windows. It's possible this is Windows specific related but don't have direct access to a Windows machine to verify on you exact setup.
Can you at least confirm that you've tested this with a template using CRLF line endings? You should be able to do that on any platform. VS Code can even do the conversion for you if you ask it to.
This is odd:
sam-template.yaml
CodeUri: !Sub "s3://my-sam-bucket/foo-bar-functions-${Ver}.jar"
to CodeUri: "s3://my-sam-bucket/foo-bar-functions-1.2.3.jar"
to remove any substitutions.AWS::LanguageExtensions
, leaving only Transform: AWS::Serverless-2016-10-31
Still SAM CLI tries to delete a different hash-named template than the one it uploaded, brazenly showing me the hashes that have nothing to do with each other.
(In addition it appears that if I remove the !Sub
so that the CodeUri
is a literal URI to the JAR that I uploaded, SAM CLI takes it upon itself to delete that JAR when it tears down the CloudFormation stack! So SAM URI can't delete the template that it uploaded, but turns around and deletes the JAR that I uploaded. Fortunately using !Sub
with interpolation seems to be enough to confuse SAM CLI so that it leaves my JAR alone.)
I am running out of ideas for why you can't reproduce this; I haven't found a way to not reproduce it on my end.
The only thing I could even possible think of is that I am invoking SAM from a Bash script in Git Bash for Windows. But that has a very low chance of being related.
I'll think of more ways I can narrow this down and provide you something to reproduce this, but it's not going to be very soon. I spent half of yesterday tracking down why Log4J (which the AWS libraries force me to use) was producing a warning. (Spoiler: it's because Log4J uses a multi-release POM, and AWS lambda doesn't know how to deal with multi-release POMs, so Lambda was using old pre-Java 9 classes. It may impact sam build
, which I don't use so I don't know. If you want more info, ask me or read aws/aws-lambda-java-libs#204 .)
@garretwilson Are you using sam deploy
or using some other way to deploy the template initially.
Leaving what I was testing below to not loose context in the future and could not repro even with subs and transforms.
template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform:
- AWS::LanguageExtensions
- AWS::Serverless-2016-10-31
Description:
Example functions.
Parameters:
Ver:
Description: The version of the service, such as "1.2.3".
Type: String
ConstraintDescription: The service version must use only word characters, dots, plus signs, and dashes.
Globals:
Function:
CodeUri: !Sub "s3://<my hardcoded bucket>/testing-5254-7/${Ver}"
Runtime: python3.9
Architectures: [x86_64]
MemorySize: 512
Timeout: 300
Resources:
HelloWorldFunction:
Type: AWS::Serverless::Function
Properties:
Handler: app.lambda_handler
Environment:
Variables:
Version: !Ref Ver
commands:
sam deploy --stack-name testing-5254-8 --guided --region us-west-2 --parameter-overrides "Ver=4d4e1a715ca32be7dcbbf7751dfb8e8a"
sam delete --stack-name testing-5254-8 --region us-west-2 --s3-bucket <my hardcoded bucket>
Notes:
I ran sam deploy --stack-name testing-5254-7 --guided --region us-west-2
as a cheap way to upload an artifact and then used that artifact in the template above. This replicates a direct S3 upload.
Are you using
sam deploy
or using some other way to deploy the template initially.
I am using sam deploy
from inside a Bash
script inside Git Bash for Windows on Windows 10. See the description of this ticket for the specific format of the command I'm using.
… could not repro even with subs and transforms.
It happens for me even when I take out the !Sub
and the AWS::LanguageExtensions
.
… as a cheap way to upload an artifact and then used that artifact in the template above. This replicates a direct S3 upload.
OK. Or you can just drag and drop the JAR into the S3 console, and hard-code the s3://…
CodeUri
in the SAM template.
Just to make sure we're on the same page (sorry if you understood this, but it wasn't clear in your last comment), the problem is not with sam deploy
, but that sam delete
tries to delete a different template than what it uploaded.
Thanks for looking at this, but it's not my priority at the moment as I can live with an extra orphaned template scattered around on the server. Right now I'm writing a bug report for Eclipse 2023-06 RC1. It seems that ever 15 minutes some other tool breaks. You can imagine how much real work I've managed to do in the past few days with sam deploy
not recognizing interpolation, AWS Lambda not working with multi-release JARs, Eclipse 2023-06 breaking with maven.compiler.release
set to 8
… arg!!
Wait … is today tequila Tuesday? (Or was that "taco Tuesday" …)
Hey @garretwilson
I've tried steps you mentioned here, and wasn't able to re-produce the problem.
One thing that caught my attention though. Are you using a JAR file which is already uploaded to S3, or are you using a local file and let SAM CLI to upload it with deploy flow? Because your example contains a JAR file in S3, but you mention that it uploads the JAR file during deployment process.
As expected SAM CLI renames the JAR file to a hash 11111111111111111111111111111111 and uploads it to the S3 bucket.
Could there be additional steps that you are doing before or after deploy? Just trying to see where this is coming from.
Thanks!
@mndeveci thank you for looking into this.
Are you using a JAR file which is already uploaded to S3, or are you using a local file and let SAM CLI to upload it with deploy flow?
I'm using a JAR file that my own script uploads to S3. Then I reference that S3 blob path from my SAM template, but using a !Ref
. This is because SAM breaks if I tried to include a !Ref
in the actual CodeUri
using a so-called relative URI for SAM to upload the JAR. (And I want to use a !Ref
because I pass in the version number via a parameter.) This is all explained in aws/aws-sam-cli#5249.
In fact my actual solution is described in painstaking detail in a comment to that ticket. And it's working great. (Since I'm doing all the work to upload the JAR file myself, though, I'm seeing less and less of a need for SAM.)
So basically there will be a !Ref
in the CodeUri
, but since it's an absolute URI SAM will ignore it and let CloudFormation resolve the !Ref
later in the pipeline, finding the JAR I already uploaded. And then when tearing down, SAM won't even try to delete the JAR (that's good, because I didn't ask it to delete the JAR and I don't want it to), because it can't understand the !Ref
anyway.
I see your point. Unfortunately intrinsic support in SAM CLI is very limited since there is no library or API endpoint that we can use to resolve them inside the template.
I am trying to see any workarounds, you might look at custom builds (where you need to provide a Makefile for your function's build process) you might try following options (both of them will be creating a template file with CodeUri
property without any intrinsic function;
sam build
and sam package
/sam deploy
sam build
and sam package
/sam deploy
as the next steps.Going back to your question for this issue, I've checked the sam delete
flow, and I find this where we try to estimate the S3 location of the deployed template (CFN doesn't have an API to return the template location), which basically downloads your deployed template and calculates hash and estimates S3 location;
https://github.com/aws/aws-sam-cli/blob/develop/samcli/lib/package/local_files_utils.py#L26-L66
So if you are deploying through SAM CLI, and then making some changes to your template and re-deploy again by another tool, it is possible that we can't estimate the location of the template file therefore that step fails during sam delete
.
I am trying to see any workarounds, …
Thank you, but this ticket was not to request workarounds to aws/aws-sam-cli#5249. I have given up on SAM in this area, and created my own workaround, which is working splendidly. This ticket was opened for a different reason.
And then again you can try using
sam build
…
I am not using sam build
at all. My Maven project builds the project just fine; I don't need SAM for that. (In fact using the Maven Shade plugin for creating uber JARs is a horrible approach for many reasons, such as conflicting files and a complete breaking of multirelease JAR dependencies — not because Shade can't create multirelease JARs, but because AWS Lambda doesn't understand multirelease uber JARs, and breaks Java 9+ functionality, etc.) My Maven build is creating a nice ZIP file. I just needed SAM to upload the ZIP file with a parameter as part of its path. It can't do that in aws/aws-sam-cli#5249, and it gets confused with deleting the template in this ticket.
Going back to your question for this issue …
👍
… I find this where we try to estimate the S3 location of the deployed template (CFN doesn't have an API to return the template location), which basically downloads your deployed template and calculates hash and estimates S3 location …
I've indicated previously in this ticket that my gut instinct is that SAM is somehow getting confused with the hash of the template before and after variable substitution.
The whole CloudFormation template thing SAM is built on is very brittle to begin with, as illustrated by aws/aws-sam-cli#5249, and in fact the very need to have the AWS::LanguageExtensions
hack in the first place, which itself doesn't even work all the time; see aws-cloudformation/cfn-language-discussion#127 .
So if you are deploying through SAM CLI, and then making some changes to your template and re-deploy again by another tool, …
I am not doing this. I am deploying the template via SAM CLI, making zero changes to the template via any means, and then using SAM CLI to delete the stack.
Again this ticket has low priority; I really don't care so much at this point. There are much bigger usability concerns such as aws-cloudformation/cfn-language-discussion#127 .
But thanks again for reading it and thinking about it.
which basically downloads your deployed template and calculates hash and estimates S3 location
But where does the content of the "deployed template" come from? Here's a snippet from the source code you indicated:
def get_uploaded_s3_object_name(
precomputed_md5: Optional[str] = None,
file_content: Optional[str] = None,
file_path: Optional[str] = None,
extension: Optional[str] = None,
) -> str:
Where are you getting that file_content
from? Is the "deployed template" an identical, byte-for-byte copy of what SAM uploaded to the S3 bucket before processing? I can't see how that could be the case. When I open a CloudFormation stack in the console and look under "Template", what I see is not identical to what I have locally. It appears the object tree has been stored somewhere, and what I see in the console is a reserialization of that, which is equivalent to what I have to the original template source file, but not identical. It certainly wouldn't generate the same hash, and I wouldn't expect it to.
So where is this "deployed template" that is byte-for-byte identical to what is in the S3 bucket, that you would expect to generate a hash of and have it match that of the original template source code?
Is the "deployed template" an identical, byte-for-byte copy of what SAM uploaded to the S3 bucket before processing?
Unfortunately we can't use what was deployed before, since you might deploy your stack in one place and might try to delete from another. So that is why we are getting the template from CFN, and then trying to estimate file location by getting its hash.
When I open a CloudFormation stack in the console and look under "Template", what I see is not identical to what I have locally
I think what we are getting is what you are seeing in CFN. We are using get_template API with TemplateStage=Original
which should return un-transformed (aka original) template.
So where is this "deployed template" that is byte-for-byte identical to what is in the S3 bucket, that you would expect to generate a hash of and have it match that of the original template source code?
CFN doesn't have an API to return deployed template location, but it looks like, it might be recorded in CloudTrail (see). Can you check those events to see where the template is stored? It might also give you some clue about why it is different from the local one.
So that is why we are getting the template from CFN, and then trying to estimate file location by getting its hash.
Why go to all that trouble in such a roundabout way? Why not just add the hash as a tag to the stack? Or better yet, the bucket name and path? Sure, the user could delete the tag. They can go into the S3 bucket and rename the staged, template too. The user can do anything. But there's no reason for them to muck with this tag, and if they know not to, then they probably won't, just like they don't rename the staged template in the SAM bucket. So why not do it the easy, direct way, instead of trying to work into it backwards by recalculating the hash and then looking up the filename?
Another question: why do you have to stage the SAM template to begin with? I can deploy a CF template directly without staging the template as far as I know. Why do we have to stage it in an S3 bucket with SAM? I'm honestly curious.
Why go to all that trouble in such a roundabout way? Why not just add the hash as a tag to the stack? Or better yet, the bucket name and path?
It is doable, but sam deploy
was already there before we released sam delete
, that is why we are doing an estimate for template's location. I will take this to team's attention about adding a tag for template's location and considering that as well while deleting all the resources.
Another question: why do you have to stage the SAM template to begin with? I can deploy a CF template directly without staging the template as far as I know. Why do we have to stage it in an S3 bucket with SAM? I'm honestly curious.
With sam deploy
we are using create_change_set
API with TemplateURL
property. That property requires an S3 location or systems manager document.
With
sam deploy
we are using create_change_set API withTemplateURL
property. That property requires an S3 location or systems manager document.
I see. Thanks for explaining. That's good to know.
I'm looking at the API and I see that there is a TemplateBody
option (I don't know if Boto supports it—I don't use it), although that's limited to ~50K in size.
I still intend to put together a reproducible test (which shouldn't be hard because I only get this behavior), but as I think about this day-to-day, there are a few doubts I still have:
I still intend to put together a reproducible test …
Since most of my day has been wasted trying to do simple things and running into aws/serverless-application-model#3265 (trival), aws/serverless-application-model#3264 (huge), aws/aws-sam-cli#5533 (irritating and time-wasting), and aws/serverless-application-model#3261 (pales in comparison to the others), I decided I might as well spend the rest of the day getting you a reproducible test case.
I started paring down my code to see what triggers this, and I ran into something else. I removed the -s3-bucket $s3BucketName
from the scripts for both sam deploy
and for sam delete
. Surprisingly, SAM went and head and deployed the template, even without a staging bucket. And it didn't create one! I haven't used the default SAM staging bucket in months. I deleted it long ago, and I don't remember what it was named. Now I use separate staging buckets for each environment.
But then when I tried to do sam-delete
without -s3-bucket
, SAM failed:
Warning: Cannot resolve s3 bucket information from command options , local config file or cloudformation template. Please use --s3-bucket next time and delete s3 files manually if required.
Odd—no warning when deploying with no staging bucket name, but a warning when deleting without a staging bucket name.
(I had thought it would create a default staging bucket; I was trying to pare down the test case to as small as possible. I guess I am not remembering correctly. I'll use a hard-coded bucket name.)
Anyway, this is probably a separate bug, but I'm pretty tired from all the bug reporting today already. For now I'll just leave it in this comment. I don't want that to sidetrack us from the main issue for this ticket.
OK, @jfuss and others, I've created a minimal test case case to reproduce this issue. It puzzles me greatly that you cannot reproduce it, because I have been unable to not reproduce this. (I have a fear that, after all the time I've spent, someone is finally going to try it on Windows and find that, "oh, this always happens on Windows"—but let's see ….) I've provided the files in a ZIP file which I've attached and linked in the instructions below.
Before going further, let me be clear about my configuration:
Note: The scripts in the ZIP file probably don't have their executable bit set. They don't need that when when running from Git Bash on Windows. If you test them in *nix, I'm sure you know that you need to do a chmod +x
or whatever on them first.
aws-sam-cli-issue-5254.sam.yaml
: The SAM template.samconfig.toml
: The SAM configuration file.deploy-aws-sam-cli-issue-5254.sh
: The deployment script.delete-aws-sam-cli-issue-5254.sh
: The deletion script.aws-sam-cli-issue-5254-staging
on your AWS account for staging. You can use any bucket name, but you'll need to do a search+replace on the scripts if you use something else.aws-sam-cli-issue-5254
is available; otherwise you'll need to update aws-sam-cli-issue-5254.sam.yaml
with some other bucket name of your choice../deploy-aws-sam-cli-issue-5254.sh
. Along with other output, I see:
Uploading to 05ca0eabd6ebdeacd0836c6ec2826579.template 328 / 328 (100.00%)
Immediately delete the stack using ./delete-aws-sam-cli-issue-5254.sh
. I see the following output:
- Could not find and delete the S3 object with the key d03c1af37631d000624ff6fee500c2ba.template
- Deleting Cloudformation stack aws-sam-cli-issue-5254
Deleted successfully
@garretwilson thanks for working on this (even on Saturday!).
Thanks to your step-by-step example, I was able to find the root cause of it (finally). As I mentioned above, we are getting deployed template and calculate its hash to find its location in S3, this works well on Linux and MacOS but it fails on Windows. The reason is, python does some extra stuff when writing strings to a file on Windows;
When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.
So what we get from CFN is this;
b"\nAWSTemplateFormatVersion: '2010-09-09'\nTransform:\n- AWS::LanguageExtensions\n- AWS::Serverless-2016-10-31\nDescription: 'Demonstration of AWS SAM CLI Issue aws/aws-sam-cli#5254.'\nResources:\n TestBucket:\n Type: AWS::S3::Bucket\n Properties:\n BucketName: aws-sam-cli-issue-5254\n Metadata:\n SamResourceId: TestBucket\n"
What is written into the file is this (\n
becomes \r\n
);
b"\r\nAWSTemplateFormatVersion: '2010-09-09'\r\nTransform:\r\n- AWS::LanguageExtensions\r\n- AWS::Serverless-2016-10-31\r\nDescription: 'Demonstration of AWS SAM CLI Issue aws/aws-sam-cli#5254.'\r\nResources:\r\n TestBucket:\r\n Type: AWS::S3::Bucket\r\n Properties:\r\n BucketName: aws-sam-cli-issue-5254\r\n Metadata:\r\n SamResourceId: TestBucket\r\n"
This auto conversion is why you get a different hash on Windows and that leads into the original problem of this issue where it can't find the S3 object for deletion.
I've tried with following options and both of them worked so far, I need to check with the team and to select one or the other.
newline=""
so that it won't do the auto-conversion of line endings on windows.Note: This is the part where it is creating this problem https://github.com/aws/aws-sam-cli/blob/develop/samcli/lib/package/local_files_utils.py#L53-L57
I've tried with following options and both of them worked so far …
I'm coming in late at night and it's not wise for me to be commenting extensively on this at the moment, but I believe that the standard practices would probably be simply to normalize to LF
before doing a hash. This is not a new type of problem. But I'll comment more when I'm less "tired" and hopefully find a way not to be grumpy tomorrow about the whole affair of having mentioned this possibility at the beginning. 😅 But I'm super-happy you found the problem. Good night and thanks.
Good morning! I want to say thanks again for continuing to look at this, and it's wonderful to finally get to the bottom of it.
Since we're all friends here, I do want to point out a couple of things, though. I filed this ticket well over a month ago. I described it concisely, precisely, and thoroughly. My first guess was that you were inconsistently hashing before/after the transform, but two days later I said:
You might check into line endings as well. I'm on a Windows machine using CRLF in the templates. If your hashing function converts to LF when doing the upload and then compares with that in the local directory without normalizing line endings, then that could lead to mismatched hashes as well. Just giving you ideas to look into.
And that was "purt near" (as the cowboys say) exactly what the problem was conceptually. All this back and forth and countless hours and $X was a complete waste—that is, we gained nothing of value from it, and we could have been doing something more productive with our time. It is my understanding that just a single test on Windows for any template a month ago would have reproduced this problem.
The other thing I want to point out is that this probably affected hundreds if not thousands of Windows users. Most people don't file tickets. Most people just see a glitch and think, "it's part of this rough-and-tumble Wild West of the cloud; I'll just ignore it." I on the other hand take the time to file tickets.
What I'm trying to get to is … if I ever sound grumpy when I get pushback on tickets I take the time to create, please remember this ticket, cut me a little slack, and don't hold it against me. 😁❤️
Again my sincere thanks for continuing to follow this and look into it.
Patch is released in v1.94.0. Closing
I'm using SAM 1.84.0 on Windows 10.
I build a JAR file with Maven and then directly deploy it using
sam deploy
like this:As expected SAM CLI renames the JAR file to a hash
11111111111111111111111111111111
and uploads it to the S3 bucket. Then it does something to the template, renames it to a hash22222222222222222222222222222222.template
, and uploads that to the S3 bucket as well. (These hashes are just contrived example values of course.)And indeed if I look in the S3 bucket, those are the exact two files that were uploaded!
Then I immediately delete the stack using
sam delete
like this:Oddly SAM asks me:
Wait, what is
33333333333333333333333333333333.template
? There is no33333333333333333333333333333333.template
on S3. SAM never uploaded a33333333333333333333333333333333.template
file. SAM never said it uploaded a33333333333333333333333333333333.template
. SAM told me (truthfully) that it uploaded a22222222222222222222222222222222.template
file.So I say, "Sure, SAM, go ahead and delete whatever you want," and SAM goes away and then comes back to say:
It's not surprising SAM couldn't find
33333333333333333333333333333333.template
, because it never uploaded such a file. And SAM left22222222222222222222222222222222.template
on S3.Why did SAM get confused and try to delete the non-existent file
33333333333333333333333333333333.template
, when it really had uploaded22222222222222222222222222222222.template
?(I don't know if it's relevant, but I'm using
AWS::LanguageExtensions
as one of the transforms. I'll just make a wild guess here with nothing to base it on, but is it possible that SAM creates the first hash based upon some sort of transformation/interpolation, but then creates the second hash based upon the raw template source file? I'm just brainstorming in case it helps you track this down.)