dapr / dapr

Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.
https://dapr.io
Apache License 2.0
23.59k stars 1.86k forks source link

Hot reloading of components #1172

Open amanbha opened 4 years ago

amanbha commented 4 years ago

Currently in k8s, Dapr needs a pod restart to pick new configuration after they have been applied. This will cause inconsistency in scenarios in which few pods of an app are restarted by platform. Restarted pods will use new config whereas pods which are not restarted will continue to use old config.

Tasks:

msfussell commented 4 years ago

There is a discussion here on the overlap between dynamic updating of components/config by dapr and when the K8s operator manages the rollout of a new application. We need to get options from both sides for this proposal.

amanbha commented 3 years ago

Adding notes from maintainers meeting: Pros and cons of dynamic updating the components::

  1. dynamic updating of components will solve a "split-brain" problem due to app scale out, failover.
  2. dynamic updating of components can also lead to issues as application code might not be ready for the change.

Suggested approaches:

  1. Keep the current behavior and document best practise that after applying the new component yaml, user should do upgrade of the application (or do pod restarts) to pick up updated components.
  2. Do dynamic upload but give user an option to opt out as an annotation.
  3. Add a dynamic upload property to each component, dapr sidecar will do "hot-reload" of the component when it receives the notification from operator only when this property is set to true.
AaronCrawfis commented 3 years ago

Proposal

Given the above what it we:

  1. Keep current behavior as default (pods do not pickup changes to components until restarted) to address Pro/Con #1
  2. Write documentation to address Approach #1
  3. Add metadata to components and configuration yaml files to opt-in to hot reloading for all pods/applications to pick up this component instantly.
    • This will allow customers to rollout hot-patches or errors to applications without restarting entire application
  4. (optional) Add ability for applications/pod specs to be annotated/flagged to always pick up components instantly upon change, no matter what above metadata says

This will allow 3 cases:

Default Behavior

  1. Application running locally or kubernetes
  2. New configuration applied
  3. Application doesn't pick up new configuration until restarted/scaled

Apply Hot Reload to Component/Configuration

  1. Application running locally or kubernetes
  2. New configuration/component applied with metadata set to opt into hot reload
  3. All applications pick up new configuration/component

Apply Hot Reload to Application/Pod

  1. Application running locally or kubernetes with flag or annotation set to hot reload
  2. New configuration/component applied (with or without metadata for hot reload, doesn't matter)
  3. Application picks up new configuration/component

We could brand this as a new feature that users opt into if they want to.

RadoslavGatev commented 3 years ago

It seems the code for the hot reload is already in place. According to the operator logs, sidecars are successfully establishing connections to it and component changes are being successfully detected. However, for some reason sidecars are not getting notified about the modified component.

yaron2 commented 3 years ago

@RadoslavGatev The operator side is mostly in place, however the sidecar part is currently (intentionally) ignoring updates.

We (maintainers) have spoken about this issue today and I just triaged this into the 0.12.0 milestone.

sebader commented 3 years ago

just came across the same issue. This really need to be solved. How else do you propose people can be sure config changes are applied?

kamesh1677 commented 3 years ago

Any fix planned for this issue in the coming releases ?

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

spasumalaithevan commented 3 years ago

Any plans to fix this issue.

engeluke commented 1 year ago

How do you enable the hot reload mentioned here ?

NickLarsenNZ commented 1 year ago

@engeluke I don't think there is a hot reload yet, this appears to be an open proposal.

NickLarsenNZ commented 1 year ago

In Kubernetes, would like the operator to do a rollout on the Deployment so that if the new component configuration breaks things, the rollout doesn't proceed.

Of course, the rolled back pods would now have the new broken configuration and also be down, but this is the same outcome as when pods are restarted (K8s node down, or rescheduled) or the Deployment is scaled up so I think that concern can be somewhat handled independently.

yaron2 commented 1 year ago

In Kubernetes, would like the operator to do a rollout on the Deployment so that if the new component configuration breaks things, the rollout doesn't proceed.

Of course, the rolled back pods would now have the new broken configuration and also be down, but this is the same outcome as when pods are restarted (K8s node down, or rescheduled) or the Deployment is scaled up so I think that concern can be somewhat handled independently.

We're looking into doing the hot reloading in such a way that the pods do not need rolling out.

NickLarsenNZ commented 1 year ago

We're looking into doing the hot reloading in such a way that the pods do not need rolling out.

Nice. Will that include rolling back to previous values if it fails (might be unsafe), or at least stopping the reload?

yaron2 commented 1 year ago

We're looking into doing the hot reloading in such a way that the pods do not need rolling out.

Nice. Will that include rolling back to previous values if it fails (might be unsafe), or at least stopping the reload?

The implementation should include replacing an existing component only if the new one inits successfully, yes.

JoshVanL commented 11 months ago

To move the hot reloading behavior out of feature preview- we need to have conformance/e2e tests to ensure that all components release all their resources after calling Close().

philliphoff commented 11 months ago

@JoshVanL Is there a document that describes the changes required from the perspective of the components? Is there new API (e.g. Close() mentioned above)? I'm curious as to whether there needs be any corresponding changes to the Pluggable Components APIs to allow for reloading.

JoshVanL commented 11 months ago

@philliphoff components have always (at least as long as I've worked on the project!) optionally supported a Close() method which runtime calls if it is available. Components which spin up long running processes should implement a Close() error method to ensure that all resources are released and all go routines have exited before Close returns.

philliphoff commented 11 months ago

@JoshVanL Thanks. It doesn't appear that such "close" calls are currently (directly) propagated to pluggable components. It may make sense for the runtime to do that (both in general, as well as to allow for "hot reload" capabilities). While I don't think it's a blocker for "hot reload" itself, it could do with some more investigation on the SDK side to determine what that might look like (and I've created an issue for that on the SDK side).

mukundansundar commented 5 months ago

@JoshVanL @yaron2 @artursouza Moving this issue to 1.14 for tracking actor state after adding all related closed PRs to 1.13 for tracking.