Open CumpsD opened 2 years ago
It seems something fishy is going on with the buffer size for reading stdout and stderr. For some reason, if the block size is too big, it will not print any output (I guess because f.read_nonblock
is waiting till the block_size is reached)
It looks like you ran into it, because you wrote this code:
def suppress_newline(line)
# Regular prompt
line.include?("Enter a value:") ||
# Edge case: When buffer is very large buffer.split("\n") only gives 8192 chars at a time
line.size == 8192 && line[-1] != "\n" ||
# Edge case: "value:" chopped off "Enter a" and "value" prompt
# Very hard to reproduce. Happens 1/5 times on terraspace up autoscaling example.
# Sometimes lines come in as:
# [...," Only 'yes' will be accepted to approve.", "", " \e[1mEnter a"]
# [" value:\e[0m \e[0m"]
line.match(/Enter a$/) || line.match(/^ value:/) # chopped off prompt
# line.include?(" value:") && lines.last.match(/Enter a$/) # chopped off prompt
end
Which is exactly what happens with this block_size behaviour
I wonder if there is a better way to capture stdout and stderr in ruby?
A bit of a hack, but this makes it work for me:
def popen3(env)
Open3.popen2e(env, @command, chdir: @mod.cache_dir) do |stdin, stdout_and_stderr, wait_thread|
handle_streams(stdin, stdout_and_stderr)
status = wait_thread.value.exitstatus
exit_status(status)
end
end
BLOCK_SIZE = Integer(ENV['TS_BUFFER_BLOCK_SIZE'] || 102400)
BUFFER_TIMEOUT = Integer(ENV['TS_BUFFER_TIMEOUT'] || 3600) # 3600s = 1h
def handle_streams(stdin, stdout_and_stderr)
# note: t=0 and t=nil means no timeout. See: https://bit.ly/2PURlCX
t = BUFFER_TIMEOUT.to_i unless BUFFER_TIMEOUT.nil?
Timeout::timeout(t) do
files = [stdout_and_stderr]
until all_eof?(files) do
ready = IO.select(files)
next unless ready
readable = ready[0]
readable.each do |f|
buffer = f.read_nonblock(BLOCK_SIZE, exception: false)
next unless buffer
lines = buffer.split("\n")
lines.each do |line|
# if f.fileno == stdout.fileno
handle_stdout(line, newline: !suppress_newline(line))
handle_input(stdin, line)
# else
# handle_stderr(line)
# end
end
end
end
end
end
Using Open3.popen2e
to get stdout and stderr together gives me the TF errors
RE: not printing errors
Think you’ve figured it out 👍 Think the terraform error output is so large that it overflows the buffer that popen3 uses to print to stderr. It’s a bummer.
Ran into similar issue before https://github.com/boltops-tools/terraspace/issues/97 This is why Terraspace sets the buffer size block size larger https://github.com/boltops-tools/terraspace/blob/37fdf43d02ccae63e60e6cc3a61afbe4b62bfb5f/lib/terraspace/shell.rb#L43
Wondering if Windows uses even larger buffer block sizes. Maybe, to test the theory, also try setting env var TS_BUFFER_BLOCK_SIZE
to something huge.
RE: popen2e vs popen3
Maybe popen2e is worth a revisit. Though, using popen3 to allow piping to tools like jq. IE:
terraspace show demo --json | jq
Only stdout gets piped to jq and it works. Stderr with info messages is not passed through the pipe. Unsure how to handle things and keep piping to jq working. Some thoughts:
TS_BUFFER_BLOCK_SIZE
setting. It’s not an ideal user dev workflow. And it would still require increasing the block size or popen2e combing stderr and stdout.
Sadly, unsure how to report the actual error. Couldn’t figure out how to flush and print out what’s in the buffer in the case of an overflow. Could just be lack of knowledge around popen. Haven’t had the time to dig in what’s happening with popen in the C code itself.
Or maybe it might be more pragmatic again to use a different mode with Windows. IE: Use popen2e for Windows and popen3 for Linux. Lol. It looks like it’s the nature of the beast when dealing with differences in OS. 🤣 Just tossing out some thoughts for now.
I'll have a look tomorrow at using a bigger buffer block size, not sure if it'll work. From playing with it, it looks like the buffer size is too big and not being filled up completely when TF crashes, resulting in nothing being printed. On the other hand.... I would expect TF to return an error code when it crashes, maybe I can play with that :)
I'll update you when I play with it later :)
Tested with BLOCK_SIZE 102400000 -> Same result, error is not printed, and no exit code is returned either, I guess it is just stuck somewhere :p
To test my assumption I added a silly print after a buffer is read:
buffer = f.read_nonblock(BLOCK_SIZE, exception: false)
puts "buffer!"
next unless buffer
This is the result:
terraspace up backend
Building .terraspace-cache/eu-central-1/dev/stacks/backend
Built in .terraspace-cache/eu-central-1/dev/stacks/backend
Current directory: .terraspace-cache/eu-central-1/dev/stacks/backend
=> terraform apply -input=false
buffer!
random_pet.this: Refreshing state... [id=quiet-hawk]
buffer!
aws_cloudwatch_log_group.logs: Refreshing state... [id=quiet-hawk]
buffer!
module.lambda_hello2.aws_cloudwatch_log_group.lambda[0]: Refreshing state... [id=/aws/lambda/hello2]
module.api_gateway.aws_apigatewayv2_api.this[0]: Refreshing state... [id=gys1whnq72]
module.lambda_hello1.aws_cloudwatch_log_group.lambda[0]: Refreshing state... [id=/aws/lambda/hello1]
module.lambda_hello1.aws_iam_role.lambda[0]: Refreshing state... [id=hello1]
module.lambda_hello2.aws_iam_role.lambda[0]: Refreshing state... [id=hello2]
buffer!
module.lambda_hello1.aws_iam_policy.logs[0]: Refreshing state... [id=arn:aws:iam::xxx:policy/hello1-logs]
buffer!
module.api_gateway.aws_apigatewayv2_stage.default[0]: Refreshing state... [id=$default]
buffer!
module.lambda_hello2.aws_lambda_function.this[0]: Refreshing state... [id=hello2]
buffer!
module.lambda_hello1.aws_iam_role_policy_attachment.logs[0]: Refreshing state... [id=hello1-20220218215412109400000001]
module.lambda_hello1.aws_lambda_function.this[0]: Refreshing state... [id=hello1]
buffer!
module.lambda_hello1.aws_lambda_permission.unqualified_alias_triggers["AllowExecutionFromAPIGateway"]: Refreshing state... [id=AllowExecutionFromAPIGateway]
buffer!
module.lambda_hello1.aws_lambda_permission.current_version_triggers["AllowExecutionFromAPIGateway"]: Refreshing state... [id=AllowExecutionFromAPIGateway]
module.api_gateway.aws_apigatewayv2_integration.this["GET /hello"]: Refreshing state... [id=y61golh]
module.api_gateway.aws_apigatewayv2_integration.this["GET /wave"]: Refreshing state... [id=w4zvekf]
buffer!
module.api_gateway.aws_apigatewayv2_route.this["GET /hello"]: Refreshing state... [id=6a1um5p]
buffer!
module.api_gateway.aws_apigatewayv2_route.this["GET /wave"]: Refreshing state... [id=j6939ni]
buffer!
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
~ update in-place
+/- create replacement and then destroy
<= read (data resources)
buffer!
Terraform will perform the following actions:
# module.api_gateway.aws_apigatewayv2_integration.this["GET /wave"] must be replaced
+/- resource "aws_apigatewayv2_integration" "this" {
+ connection_id = (known after apply)
~ id = "w4zvekf" -> (known after apply)
+ integration_response_selection_expression = (known after apply)
~ integration_type = "AWS_PROXY" -> (known after apply) # forces replacement
~ integration_uri = "arn:aws:lambda:eu-central-1:xxx:function:hello2" -> (known after apply)
- request_parameters = {} -> null
- request_templates = {} -> null
# (5 unchanged attributes hidden)
}
# module.api_gateway.aws_apigatewayv2_route.this["GET /wave"] will be updated in-place
~ resource "aws_apigatewayv2_route" "this" {
id = "j6939ni"
~ target = "integrations/w4zvekf" -> (known after apply)
# (6 unchanged attributes hidden)
}
# module.lambda_hello2.data.aws_iam_policy_document.logs[0] will be read during apply
# (config refers to values not yet known)
<= data "aws_iam_policy_document" "logs" {
+ id = (known after apply)
+ json = (known after apply)
+ statement {
+ actions = [
+ "logs:CreateLogGroup",
+ "logs:CreateLogStream",
+ "logs:PutLogEvents",
]
+ effect = "Allow"
+ resources = (known after apply)
}
}
# module.lambda_hello2.aws_cloudwatch_log_group.lambda[0] must be replaced
+/- resource "aws_cloudwatch_log_group" "lambda" {
~ arn = "arn:aws:logs:eu-central-1:xxx:log-group:/aws/lambda/hello2" -> (known after apply)
~ id = "/aws/lambda/hello2" -> (known after apply)
~ name = "/aws/lambda/hello2" -> "/aws/lambda/hello1" # forces replacement
tags = {
"Environment" = "Development"
}
# (2 unchanged attributes hidden)
}
# module.lambda_hello2.aws_iam_policy.logs[0] will be created
+ resource "aws_iam_policy" "logs" {
+ arn = (known after apply)
+ id = (known after apply)
+ name = "hello1-logs"
+ path = "/"
+ policy = (known after apply)
+ policy_id = (known after apply)
+ tags = {
+ "Environment" = "Development"
}
+ tags_all = {
+ "Environment" = "Development"
}
}
# module.lambda_hello2.aws_iam_role.lambda[0] must be replaced
+/- resource "aws_iam_role" "lambda" {
~ arn = "arn:aws:iam::xxx:role/hello2" -> (known after apply)
~ assume_role_policy = jsonencode( # whitespace changes
{
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
Sid = ""
},
]
Version = "2012-10-17"
}
)
~ create_date = "2022-02-18T21:33:01Z" -> (known after apply)
~ id = "hello2" -> (known after apply)
~ managed_policy_arns = [] -> (known after apply)
~ name = "hello2" -> "hello1" # forces replacement
+ name_prefix = (known after apply)
tags = {
"Environment" = "Development"
}
~ unique_id = "AROATS32B5M2Q5NDAOXQM" -> (known after apply)
# (4 unchanged attributes hidden)
- inline_policy {}
+ inline_policy {
+ name = (known after apply)
+ policy = (known after apply)
}
}
# module.lambda_hello2.aws_iam_role_policy_attachment.logs[0] will be created
+ resource "aws_iam_role_policy_attachment" "logs" {
+ id = (known after apply)
+ policy_arn = (known after apply)
+ role = "hello1"
}
# module.lambda_hello2.aws_lambda_function.this[0] must be replaced
+/- resource "aws_lambda_function" "this" {
~ arn = "arn:aws:lambda:eu-central-1:xxx:function:hello2" -> (known after apply)
~ function_name = "hello2" -> "hello1" # forces replacement
~ id = "hello2" -> (known after apply)
~ invoke_arn = "arn:aws:apigateway:eu-central-1:lambda:path/2015-03-31/functions/arn:aws:lambda:eu-central-1:xxx:function:hello2/invocations" -> (known after apply)
~ last_modified = "2022-02-19T00:20:03.445+0000" -> (known after apply)
- layers = [] -> null
~ qualified_arn = "arn:aws:lambda:eu-central-1:xxx:function:hello2:1" -> (known after apply)
~ role = "arn:aws:iam::xxx:role/hello2" -> (known after apply)
+ signing_job_arn = (known after apply)
+ signing_profile_version_arn = (known after apply)
~ source_code_size = 761249 -> (known after apply)
tags = {
"Environment" = "Development"
}
~ version = "1" -> (known after apply)
# (11 unchanged attributes hidden)
~ tracing_config {
~ mode = "PassThrough" -> (known after apply)
}
}
# module.lambda_hello2.aws_lambda_permission.current_version_triggers["AllowExecutionFromAPIGateway"] will be created
+ resource "aws_lambda_permission" "current_version_triggers" {
+ action = "lambda:InvokeFunction"
+ function_name = "hello1"
+ id = (known after apply)
+ principal = "apigateway.amazonaws.com"
+ qualifier = (known after apply)
+ source_arn = "arn:aws:execute-api:eu-central-1:xxx:gys1whnq72/*/*"
+ statement_id = "AllowExecutionFromAPIGateway"
}
# module.lambda_hello2.aws_lambda_permission.unqualified_alias_triggers["AllowExecutionFromAPIGateway"] will be created
+ resource "aws_lambda_permission" "unqualified_alias_triggers" {
+ action = "lambda:InvokeFunction"
+ function_name = "hello1"
+ id = (known after apply)
+ principal = "apigateway.amazonaws.com"
+ source_arn = "arn:aws:execute-api:eu-central-1:xxx:gys1whnq72/*/*"
+ statement_id = "AllowExecutionFromAPIGateway"
}
Plan: 8 to add, 1 to change, 4 to destroy.
╷
│ Warning: Argument is deprecated
│
│ with module.lambda_hello2.aws_s3_bucket_object.lambda_package,
│ on ..\..\modules\lambda\main.tf line 124, in resource "aws_s3_bucket_object" "lambda_package":
│ 124: bucket = var.s3_bucket
│
│ Use the aws_s3_object resource instead
│
│ (and 3 more similar warnings elsewhere)
╵
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
buffer!
buffer!
module.lambda_hello2.aws_cloudwatch_log_group.lambda[0]: Creating...
buffer!
module.lambda_hello2.aws_iam_role.lambda[0]: Creating...
As you can see, after the last line, there is no more "buffer!" printed, so the error is probably stuck inside buffer = f.read_nonblock(BLOCK_SIZE, exception: false)
where the code waits for more characters.
When you get a chance, wondering if you can try
buffer = f.read_nonblock(BLOCK_SIZE, exception: true)
Think am using exception: false
because an Exception signifies the end of the stream. Unsure which exception or exceptions are the ones we want in this case though. There might be some exceptions that indicate a buffer overflow. 🧐
Note: It’ll probably be a decent amount of digging ⛏ On the road the last 2 days and been on my phone 🤳🏽😆 Also, don’t have Windows set up yet. Thanks for the help. No sweat either way of course. 👍
Also, am leaning on a switch that uses popen2e for Windows for now 😂
buffer = f.read_nonblock(BLOCK_SIZE, exception: true)
--> No difference
Using popen2e
will need refactoring though, with my hack, this is broken:
Error: Module not installed
│
│ on domain.tf line 13:
│ 13: module "records" {
│
│ This module is not yet installed. Run "terraform init" to install all
│ modules required by this configuration.
No more automatic init
Bummer. The automatic init is based on a few things, whether or not the .terraform cache folder exists but will also automatically retry based on terraform command exit status. Guessing popen2e
might suppress error exit statuses and messes up automatic init.
Wondering if you can give this a try when you get a chance
config/app.rb
Terraspace.configure do |config|
logger = Logger.new($stdout)
logger.level = :debug
config.logger = logger
end
Docs: https://terraspace.cloud/docs/config/app/#custom-logger
Checklist
My Environment
Expected Behaviour
See all TF output
Current Behavior
Only positive output is shown
Step-by-step reproduction instructions
In the image you see:
On the left, terraspace 1.1.3, shell.rb with and added
puts line
on line 94On the right, some invalid TF code which crashes. As well as the logger configuration
info
Without the
puts line
, no errors are shown. With the line, the output as in the screenshot is shown.Perhaps I am really configuring my logger wrong? I copied it from the code examples. Or something is up with the logger itself.