Open Tony-Proum opened 4 years ago
For transparency to anyone looking, conversation is happening in PR #668 🙂
Hey, moving the conversation back to this issue instead of the PR so I can better understand the issue you're facing and the proposed solution 🙂
I think I may have misunderstood the original issue a bit, apologies for that. To confirm, you're right that changing the monitor type will require the monitor to be deleted and recreated and this is imposed at the API layer.
To restate the issue to confirm I follow, you'd like to update a monitor type, and have that new ID be referenced in the SLO definition. However, the monitor can't be deleted because it's a part of the SLO.
Given this, I was testing out the force_delete workflow and I believe I understand your note about the usability workflow. You'd need to apply a configuration with the force_delete flag set on each monitor before actually changing the monitor definition to be of another type, otherwise the monitor deletion still fails with a "still referenced in SLO" error. But this should allow you to modify the monitor type without any provider changes.
I believe your proposal is to ForceDelete the SLO if the underlying monitor_id changes. Does this solve the issue you're facing? I believe this would require the SLO to be deleted before any lifecycle changes happened to the monitor.
No worry, I'm not the better to write issues or to explain clearly things without some kind of support (images schemas and other tools like that)
Forcing the deletion of the SLO solves the issue for all my tests, I hope to have test most of existing case, and the PR contains an automated test for this workflow. The only things that I were worry about is the data that could be lost using this deletion. But as far as I could experiment this I did'nt noticed any issue with this as all data displayed into an SLO seems to be saved on the monitor resource (correct me if I'm wrong)
So, seems ok
I updated the PR with some more informations: a link exists between dashboard and SLO too. see more details here https://github.com/DataDog/terraform-provider-datadog/pull/668#issuecomment-715238587
Hey @Tony-Proum again, I apologize for my delay on this one. I just wanted to leave a note to let you know I'm still taking a look here! I think there may be a few issues being hit and I am just trying to get a complete picture.
Hey @Tony-Proum I've been playing around with this some more and wanted to follow up. As mentioned in my previous note, I think there may be multiple issues at play here, and I want to ensure we can provide a solution while avoiding introducing any other edge cases or complexities.
I do have reservations around adding "forceNew" to additional places, if its indeed the only viable option, it's something we can definitely add, but I think it should be one of the later solutions attempted. There have been edge cases in the past, where forceNew was triggered in error on the monitor resource (for example when switching between metric and query types) that caused users to lose monitor history unnecessarily.
I've been looking into this and testing with the config you provided. As you noted, there is a link both between Monitors and the SLO, and between the SLO and Dashboards. Both monitor
and slo
have a force
flag at the api layer that was introduced to allow this to be bypassed. (This is available today in the Go client, but its not yet in terraform for the SLO resource - https://github.com/DataDog/datadog-api-client-go/blob/master/api/v1/datadog/api_service_level_objectives.go#L367)
To propose an alternative, would having the force
flag available in terraform on the SLO resource also address your workflow? The steps there would be to apply the force = true
field to the monitor and slo resource prior to changes if you're comfortable with deleting a monitor/slo thats used somewhere. Let me know what you think, and I appreciate you working to get this issue resolved!
@nmuesch I would prefer not to use force = true: in case of using multiple terraform worspaces to split the configuration into small chuncks, using force delete could results in unexpected results (dashboards without slo or slo without monitor). I still "prefer" the forceNew option to securely handle changes in the infrastructure.
Also I understand that this case is not really easy. I'm also not at ease with potential use cases that could be introduced with this solution, but well, each time I had to change some underlying resources (e.g: monitor or SLOs of an existing dashboard) the best way I found to force terraform to do what I expect it to do, is to change all resource names in upper resources (in order to manually trigger a ForceNew)
I also tried to figure out where are the data stored and their seems to be stored at the lower level (in monitor resource or directly in metrics) dashboard and SLOs are just some kind of different view point around those data. I think that this hypothesis have to be verified, but If so, it may not be a big risk to use the forceNew way and I really think that this solution is the most compliant to the "terraform way" to handle this type of dependency
Hi, I'm wondering if there's any update on this? We are hitting a similar issue when Terraform tries to delete a monitor that is still associated with an SLO. In our case though this is when we go to test our terraform modules it spins up the infra and all associated pieces (including datadog monitors and SLO dashboards), then we hit the error when destroying the test resources.
The recommended approach is to add the force_delete
flag on the resources. This way we skip the server side validation happening when deleting resources.
do you mean force_delete
@therve ? https://registry.terraform.io/providers/DataDog/datadog/2.26.1/docs/resources/monitor
Yes sorry!
Hey there, I wanted to mention this also seems to include any composite monitor relationship as well.
Example:
resource "datadog_monitor" "composite_example" {
type = "composite"
name = "Example composite monitor"
query = "19443597 && 19443598"
message = "composite monitor"
}
Error: error deleting monitor: 400 Bad Request: {"errors": ["monitor [19443597,Monitor example] is referenced in composite monitors: [37050226,Example composite monitor]"]}
Terraform Version
Terraform v0.12.29
Affected Resource(s)
Terraform Configuration Files
Actual Behavior
Modifying the monitor type is not possible in this configuration without using some workaround: When modifying the type of the monitor, terraform needs to destroy the resource and to recreate it (due to an API restriction ? The documentation don't mention that but I didn't tried). The plan shows that the operation is needed, and it make sense as illustrated here:
But at runtime, terraform output displays throws this type of error :
Expected Behavior
In this particular case, datadog_service_level_objective monitor_ids attribute could forces terraform to replace the resource (seems like it the simplest solution for this problème)
Workaround
If this type of error is thrown, changing the resource name e.g:
resource "datadog_service_level_objective" "availability"
toesource "datadog_service_level_objective" "availability-v2"
Forces replacement of the resource and allows terraform to handle this case.Steps to Reproduce
Using the example code above,
terraform apply
modify the
datadog_monitor
type or thedatadog_service_level_objective
typeterraform apply
References