Dynamic Infrastructure Feedback

rhysparry commented 2 years ago

We're seeking feedback for changes to how Octopus manages Dynamic Infrastructure.

Please post your feedback below.

Specifically, we want to know:

About edge cases we might not have anticipated in regards to tagging
Will this solution fit your use case?
Are there other dynamic infrastructure challenges you'd like to see Octopus address?

Alternatively, book time with us to discuss

Thanks so much for your help.

Blog post: Request for Comments - Dynamic infrastructure

phillip-haydon commented 2 years ago

To discover targets, Octopus will need to authenticate with the appropriate cloud provider. You'll be able to provide this context using Octopus variables. For example, you'll be able to configure a variable named Octopus.Azure.Account to reference the Azure account to use for discovery. We'll link the account from the authentication context during discovery to the target by default.

Are you specifically targeting Azure to begin with? Or will do AWS at the sametime / first?

For AWS I would not want to use the Accounts system and instead want to rely on AWS's built in EC2 Roles so I don't need to generate credentials.

Edit: Ah I just read the AWS roles stuff at the bottom :)

phillip-haydon commented 2 years ago

AWS ECS clusters

This maybe a silly question, but anyone deploying to containers would be deploying a new version of a container right? Not deploying to a target in a container? Wouldn't it be better to support EC2 where the targets are likely to remain between deployments?

pessoa commented 2 years ago

We manage all the infrastructure using terraform. When we increase the environment count the code handles not only the required cloud provider resources but also the Octopus targets. They will be available immediately after the provisioning code runs.

cdhunt commented 2 years ago

Request: octopus-tenant tag support.

NiteDesign commented 2 years ago

Hello,
First off, excited to see progress on this feature! We are using AWS with AutoScaling groups. When we increase our instance count, our process:

Registers the new target to Octopus (same approach, using AWS tags for environment, role, tenant, etc)
Sends a POST to the api/deployments api using SpecificMachineIds parameter, to specify the deployment for only the new target.
We do this as we dont require re-deploying the entire project to all other resources, only newly launched resources. Hopefully there is a way to do similar?

pio-janik commented 2 years ago

We also manage all the infrastructure using terraform and octopusdeploy provider. What could be for us more benefit is dynamic worker pools. Most of our deployments are behalf of worker - in this case there are GCP Compute Engine instances. It would be great if Octopus could scale number of instances based on load. Let's say scale up during work hours and scale down after. Immutable VM with all necessary tools We can achieve using Packer. This can reduce our infrastructure costs.

mcasperson commented 2 years ago

It would be very useful to be able to tag a target as disabled. This would allow administrators to retain all the existing configuration like environment, tenant, account etc but still allow the target to be excluded from any deployments in the same way a disabled Octopus target is removed today. Then, once the target is ready again, the disabled tag is removed or set to false.

mcasperson commented 2 years ago

I suspect more advanced teams would also require the ability to scope their targets to individual Octopus instances. This would provide some assurances that targets in one AWS account would not just "pop up" in multiple Octopus instances. This could be solved by using targeted AWS accounts that were limited in their ability to scan for potential targets, but if Octopus recognised tags like octopus-instance, it would be a common solution across all providers.

To give you an example, we've gotten this feedback:

This team are looking to set up an Octopus instance in AWS. Their non-prod environments exists in one AWS AZ, with Production in another, with no peering between the two AZs allowed. They have this replicated in another region for DR, so two AZs in Ireland and the same in Frankfurt. They wanted to discuss options for configuring Octopus for this situation.

We've identified dozens of customers that use multiple Octopus instances for various reasons, and you can find more info in the internal document Pitch: Octopus as a target.

mcasperson commented 2 years ago

How would target based triggers be supported with this new feature? Today a target trigger workflow allows for a new target to be added or become healthy, activating a deployment target trigger, and then usually deploying a release to the new target.

If I'm understanding this pitch correctly, a target is only discovered/added when it is required as part of a deployment. I'm unsure how target triggers would be fit into this.

mcasperson commented 2 years ago

Will there be an option to opt out of using dynamic targets? For example, if my production environment is considered highly controlled, it would be nice to know that I have complete control over the targets that receive deployments by not enabling dynamic targets.

pio-janik commented 2 years ago

@mcasperson good point. We use target based triggers actively. This approach allow us to upgrade all infrastructure resources without downtime for our customers or enabling maintenance window.

jfletcher93 commented 2 years ago

Hello, First off, excited to see progress on this feature! We are using AWS with AutoScaling groups. When we increase our instance count, our process:

Registers the new target to Octopus (same approach, using AWS tags for environment, role, tenant, etc)

Sends a POST to the api/deployments api using SpecificMachineIds parameter, to specify the deployment for only the new target. We do this as we dont require re-deploying the entire project to all other resources, only newly launched resources. Hopefully there is a way to do similar?

We are doing something similar, having the capacity to auto-scale and deploy to the new targets is something we have baked into it ourselves using health check triggers, but having a tag based discovery for EC2s would make things so much easier from a registration point of view.

rhysparry commented 2 years ago

@phillip-haydon

Are you specifically targeting Azure to begin with? Or will do AWS at the sametime / first?

Yes, Azure was first off the rank. We have internal prototypes discovering ECS targets using AWS accounts.

For AWS I would not want to use the Accounts system and instead want to rely on AWS's built in EC2 Roles so I don't need to generate credentials.

Support for EC2 instance roles and assuming roles with AWS targets are currently being worked on. Right now, this is looking at our ECS Cluster target, but we're building it as a foundation for future targets.

This maybe a silly question, but anyone deploying to containers would be deploying a new version of a container right? Not deploying to a target in a container?

The ECS cluster target supports deploying and updating services. These services leverage a task definition to point to the container image being executed. The deployments aren't inside the containers themselves, but instead the orchestration layer that manages them.

Wouldn't it be better to support EC2 where the targets are likely to remain between deployments?

We're specifically focusing on the Platform as a Service (PaaS) offerings of the major cloud providers. These platforms have the advantage that once we have discovered them, deploying and configuring them is largely a case of making the right API calls.

EC2 instances (and other virtual machines) have the disadvantage that even once discovered, we don't necessarily have enough context to be able to reliably configure them. Currently, the best approach to setting up an EC2 instance with Octopus is to install/configure the Octopus tentacle during the instance initialization. With an appropriate machine policy, it can be cleaned up and deployment target triggers can be used to deploy the current release.

rhysparry commented 2 years ago

@pessoa

We manage all the infrastructure using terraform. When we increase the environment count the code handles not only the required cloud provider resources but also the Octopus targets. They will be available immediately after the provisioning code runs.

I'd love to hear what sort of targets you are deploying to. It sounds like you might already have a solution that is a great fit for you. Are there any aspects of this process you'd like to see improve?

rhysparry commented 2 years ago

@cdhunt

Request: octopus-tenant tag support.

Yes, this is most definitely on the cards. Our expectation is that you can add as many scoping tags as necessary to ensure the target is correctly created. @mcasperson, this will include octopus-instance.

rhysparry commented 2 years ago

@NiteDesign

We are using AWS with AutoScaling groups. When we increase our instance count, our process:

Registers the new target to Octopus (same approach, using AWS tags for environment, role, tenant, etc)

Sends a POST to the api/deployments api using SpecificMachineIds parameter, to specify the deployment for only the new target.

We do this as we dont require re-deploying the entire project to all other resources, only newly launched resources. Hopefully there is a way to do similar?

It sounds like deployment target triggers might be a good fit for your use case as they are designed to deploy to specific targets when the target is registered in Octopus.

rhysparry commented 2 years ago

@pio-janik

We also manage all the infrastructure using terraform and octopusdeploy provider. What could be for us more benefit is dynamic worker pools. Most of our deployments are behalf of worker - in this case there are GCP Compute Engine instances. It would be great if Octopus could scale number of instances based on load. Let's say scale up during work hours and scale down after. Immutable VM with all necessary tools We can achieve using Packer. This can reduce our infrastructure costs.

Improving the experience behind worker pools is definitely something we are discussing currently. We don't have any firm plans at this stage, but we'll be sure to let you know when that changes.

AndrewMcLachlan commented 2 years ago

We have already tagged all our resources with their environment. It would be useful to be able to customise the name of the tag Octopus looks at.

n360speed commented 2 years ago

As much as the tags would be helpful, it would be better if we can provision the agents and workers via Octo.

Here are good examples. https://github.com/JetBrains/teamcity-azure-agent https://github.com/JetBrains/teamcity-google-agent

Support of Auto Scaling / VMSS would be ideal.

rhysparry commented 2 years ago

@AndrewMcLachlan

We have already tagged all our resources with their environment. It would be useful to be able to customise the name of the tag Octopus looks at.

This was discussed early on. I certainly find the idea of looking at two different tags both showing the name of the environment to be a little jarring.

We discussed custom tagging early on as I found the idea of looking at two different tags, both with the name of the environment, to be a little jarring. However, custom tagging introduced additional complexities we felt outweighed the benefits.

Should custom tags be configured per project, space or instance? How will a user know what the custom tags will be? How does this complicate our documentation? Will it increase the back and forth with our support team?

These were the sort of questions we had but lacked satisfactory answers.

In the end, we considered that the burden of an additional tag was likely a small price to pay to eliminate the extra complexity.

That said, we'd love to know more about your scenario and understand the intricacies you might need to work through to add these tags.

rhysparry commented 2 years ago

@n360speed

As much as the tags would be helpful, it would be better if we can provision the agents and workers via Octo.

Support of Auto Scaling / VMSS would be ideal.

We've had some internal discussions about how we can help customers leverage their existing cloud resources to configure worker pools. I've been particularly interested in how Octopus can use the credentials pre-configured on the worker. Kubernetes and container orchestration platforms like ECS may also provide new ways to manage scale and ensure your workers have the necessary tooling. We'll be looking for more customer feedback as these plans evolve, so keep an eye on our blog.

IMOlbrody commented 2 years ago

Right now, we do something similar, as part of our provisioning system we install an octopus tentacle on the EC2 instances as part of our instance provisioning. Tags on the instance direct the provisioning scripts as to which octopus environment to add the EC2 instance to. Deployment triggers, then kick off the deployment to the instance. I am trying to understand the benefits we might have of moving from what we are doing to this new system? How are new releases handled, can we control if it deploy to existing instance or only new ones, to allow for deployment patterns like blue-green?

We have multiple AWS accounts, across multiple-regions with 100s of machines in each account. If the system is looking for tagged machines, is there a performance concern that this will slow down deployments, especially if they don't use that feature?

rhysparry commented 2 years ago

@IMOlbrody

Right now, we do something similar, as part of our provisioning system we install an octopus tentacle on the EC2 instances as part of our instance provisioning. Tags on the instance direct the provisioning scripts as to which octopus environment to add the EC2 instance to. Deployment triggers, then kick off the deployment to the instance. I am trying to understand the benefits we might have of moving from what we are doing to this new system? How are new releases handled, can we control if it deploy to existing instance or only new ones, to allow for deployment patterns like blue-green?

This change isn't intended to replace the existing deployment target trigger functionality, so that's likely to still be your best option.

We have multiple AWS accounts, across multiple-regions with 100s of machines in each account. If the system is looking for tagged machines, is there a performance concern that this will slow down deployments, especially if they don't use that feature?

We've taken performance considerations into account, and it will only run target discovery if it is configured to do so. These are the strategies we are considering to minimise the number of potential targets returned when we query the cloud provider:

Limit the applicable regions. For AWS, there is no sense in querying regions you aren't using.
Reduce the permissions of the credentials used to only permit listing deployable resources.
Specify an Azure resource group or some other limiting constraint.

This pattern works best with PaaS targets that perform the scaling for you, thus limiting the surface area for discovery.

OctopusDeploy / StepsFeedback

Dynamic Infrastructure Feedback #8