Closed constanca-m closed 1 month ago
Great details @constanca-m and thanks for the issue. Adding some thoughts:
Remove the need for the user to create its own S3 bucket and place their config.yaml file
Giving the option to the user to have an S3 bucket with config and not create it with terraform is a "feature". I mean we can keep this conditional and even in the option 1 to allow users to provide an existing s3Because with this list you will need to match different scenario we need to check whether the list has fields not mandatory. Otherwise we might need to create different kind of lists and iterate on all of them?
I checked, and there are optional object types in terraform. I noticed a while ago we are actually already using them.
We also could (or should) add validation rules to check the requirements:
Like this:
variable "inputs" {
description = "List of inputs to ESF"
type = list(object({
type = string
id = string
outputs = list(object({
type = string
args = list(object({
elasticsearch_url = optional(string)
cloud_id = optional(string)
api_key = optional(string)
username = optional(string)
password = optional(string)
es_datastream_name = string
batch_max_actions = optional(number, 500)
batch_max_bytes = optional(number, 10485760)
}))
}))
}))
default = []
validation {
condition = alltrue([
for input in var.inputs : alltrue([
for output in input.outputs : alltrue([
for args in output.args : alltrue([
args.elasticsearch_url == null && args.cloud_id == null ? false : true
])
])
])
])
error_message = "All outputs must contain elasticsearch_url or cloud_id. elasticsearch_url takes precedence if both are included."
}
}
Example of usage:
inputs = [
{
type = "type1"
id = "id1"
outputs = [
{
type = "output1"
args = [
{
#elasticsearch_url = "url"
#cloud_id = "cloudid"
api_key = "apikey"
username = "username"
password = "password"
es_datastream_name = "es-default"
}
]
}
]
}
]
Fails with:
╷
│ Error: Invalid value for variable
│
│ on variables.tf line 85:
│ 85: variable "inputs" {
│ ├────────────────
│ │ var.inputs is list of object with 1 element
│
│ All outputs must contain elasticsearch_url or cloud_id. elasticsearch_url takes precedence if both are included.
│
│ This was checked by the validation rule at variables.tf:106,3-13.
I mean we can keep this conditional and even in the option 1 to allow users to provide an existing s3
The issue with this, is that then we would have to be applying either option 2 or 3 because we need to obtain the ARNs in some way. If the file would remain as YAML, then it would be option 3.
Sorry, I actually misunderstood. With an existing S3 bucket we can also use aws_s3_object to upload the file. Good option :+1: @gizas
Thank you!!!
In this:
type = list(object({
type = string
id = string
The id ---> equals to arn. The field name is not so descriptive . But it is minor
The issue with this, is that then we would have to be applying either option 2 or 3 because we need to obtain the ARNs
Indeed the option 1 becomes equal to 3. I would say that is valuable to have both 1 and 3 as options for the users with a flag or so to distinguish if the users want to create the s3 or not. What do you think?
Indeed the option 1 becomes equal to 3. I would say that is valuable to have both 1 and 3 as options for the users with a flag or so to distinguish if the users want to create the s3 or not. What do you think?
I was wrong when I wrote that. The S3 bucket can exist, and we should consider that. I updated the comment to fix my mistake @gizas
I am having a deeper look at option 1, but maybe having that variable is too confusing.
The variable would have to be
type = list(object({
type = string
id = string
outputs = list(object({
type = string
args = list(object({
elasticsearch_url = optional(string)
logstash_url = optional(string)
cloud_id = optional(string)
api_key = optional(string)
username = optional(string)
password = optional(string)
es_datastream_name = string
batch_max_actions = optional(number, 500)
batch_max_bytes = optional(number, 10485760)
ssl_assert_fingerprint = optional(string)
compression_level = optional(string)
}))
}))
}))
So if we have 2 inputs, this could look like this:
inputs = [
{
type = "cloudwatch-logs"
id = "id1"
outputs = [
{
type = "elasticsearch"
args = [
{
elasticsearch_url = "url"
api_key = "apikey"
es_datastream_name = "es-default"
}
]
}
]
},
{
type = "cloudwatch-logs"
id = "id1"
outputs = [
{
type = "elasticsearch"
args = [
{
elasticsearch_url = "url"
api_key = "apikey"
es_datastream_name = "es-default"
}
]
}
]
}
]
Would it just be easier to have the config.yaml
locally along with the terraform files? This way, we can have a variable config_path
and read the file as file(${var.config_path})
.
Pros of this approach:
my feeling is that yaml is more user friendly for placing config rather than json
Cons:
I also like option 1, and if we can, I think it would be preferable to not require a separate config.yaml
.
I also tend to agree with @gizas that if we could optionally support config.yaml
that would be great -- that would provide an escape hatch in case the user wants to do something unusual. I guess this gets complicated though, because then we would also need to optionally support specifying IAM privileges?
Just a thought: would it make sense to provide a simpler configuration for Terraform, such that you could only specify one output, but many inputs?
So you would have separate variables for elasticsearch_url
, logstash_url
, etc., and (say) cloudwatch_log_group_arns
. That last one would be used to build the input list, and the output would be identical for every input apart from es_datastream_name
(which I'm not sure off hand how we would specify).
I guess this gets complicated though, because then we would also need to optionally support specifying IAM privileges?
I don't think it would get complicated in the sense. We always need to grant s3:GetObject
because the lambda itself needs the config.yaml
file in the bucket. If it is not there, we will upload it.
Regarding:
I think it would be preferable to not require a separate config.yaml.
Does this mean we should add a variable @axw ? I am leaning towards having a config.yaml
file locally that we will upload in the S3 bucket mainly to keep the logic, but I also like the variable.
would it make sense to provide a simpler configuration for Terraform, such that you could only specify one output, but many inputs
It could be an option. I would have to think how that would work though... It also means the logic would be different to the one we used for config.yaml
, and I don't know if it is good idea to do that.
Maybe it would be best to tackle that in a different issue... and change the code directly to allow us to have shared outputs. Instead of doing that just for terraform. What do you think?
I guess this gets complicated though, because then we would also need to optionally support specifying IAM privileges?
I don't think it would get complicated in the sense. We always need to grant s3:GetObject because the lambda itself needs the config.yaml file in the bucket. If it is not there, we will upload it.
What I meant was that if we support an arbitrary config.yaml
, then we would need to work out which resources that ESF is allowed to receive data from -- Kinesis data streams, SQS queues, etc. At the moment you have to specify this as a variable. So we would either have to keep that, or parse them from the config file. Right?
Looks like in https://github.com/elastic/terraform-elastic-esf/pull/7 you're parsing the local config file. I think this is a fine approach.
would it make sense to provide a simpler configuration for Terraform, such that you could only specify one output, but many inputs
It could be an option. I would have to think how that would work though... It also means the logic would be different to the one we used for config.yaml , and I don't know if it is good idea to do that.
Maybe it would be best to tackle that in a different issue... and change the code directly to allow us to have shared outputs. Instead of doing that just for terraform. What do you think?
Fair enough. I was just thinking we could support only a limited set of functionality to start with, and keep the Terraform module simpler. But probably best to keep these things separate as you say.
then we would need to work out which resources that ESF is allowed to receive data from -- Kinesis data streams, SQS queues, etc
Yes, the variable also has a validation rule on which inputs it can get. But I believe the ESF code checks that as well.
Currently, to use these terraform configuration files we need to specify the inputs and then use the same ARNS presented there in the variables of this repository.
Example: We have a
config.yaml
that looks like this:In this case, we have cloudwatch logs input
arn:aws:logs:eu-central-1:627286350134:log-group:constanca-benchmark-test-cloudwatch-lg:*
, so we need to declare it in our variables as well:This is obviously not ideal, especially if the user is using many inputs.
I have thought of 3 options to remove the need of duplicates.
Option 1 I believe this is the best option.
In this option we would need to:
inputs
variable in aconfig.yaml
file and upload it to our new S3 bucket.With this option we:
config.yaml
fileconfig.yaml
and in our terraform variablesOption 2 In this option we:
With this option:
config.yaml
and in our terraform variablesOption 3 In this option we:
With this option:
config.yaml
and in our terraform variables