cloudfoundry / cf-networking-release

Container Networking for CloudFoundry
Apache License 2.0
66 stars 72 forks source link

When a Space is deleted, the network policies of the apps in this Space are not deleted immediately #26

Closed jessehu closed 6 years ago

jessehu commented 6 years ago

I'm using: CF CLI 6.31.0+b35df905d.2017-09-15 Ops Mgr 1.12.0, ERT 1.12.0 and cf-networking 1.6.0

Here is the reproduce steps: 1) create 2 Orgs, and 1 Space for each Org 2) push 2 apps under both Spaces 3) use 'cf add-network-policy' to add 2 polices in both Spaces 4) 'cf network-policies' returns 2 polices for both Spaces; and Network Policy API returns all 4 polices : diego_database/0e3f6506-1741-41c7-b5d8-598c922833d3:~$ curl -sk --cacert $BBS_CA_CERT_FILE --cert $BBS_CERT_FILE --key $BBS_KEY_FILE https://network-policy-server.service.cf.internal:4003/networking/v0/internal/policies 5) delete one of the Space 6) 'cf network-policies' returns 2 polices for the remaining Space; Network Policy API still returns 4 polices, but 2 policies are expected. 7) After several minutes (around 15 min or more), Network Policy API returns 2 polices as expected.

Please help take a look. Thanks.

cf-gitbot commented 6 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/152272249

The labels on this github issue will be updated when the story is started.

rusha19 commented 6 years ago

This is expected behavior. Stale policies are cleaned up periodically based on a BOSH property on the policy-server job, this defaults to 1 hour after which you will see the policies are deleted. The minimum cleanup interval that can be configured is 1 minute.

jessehu commented 6 years ago

Thanks @rusha19. Could you point out which config property for this cleanup interval ? Can it be configured on OpsMgr UI ? I'm using Network Policy API to retrieve all Network Policies. The stale(deleted) Policies cause big issue in my case, because we don't know the policies are stale. 1) Is there way to distinguish stale policies and valid polices? e.g. add a tag in 'stale: true' in the API response. 2) Can the minimum cleanup interval set to 'immediately'?

jessehu commented 6 years ago
  1. If set the minimum cleanup interval to 1 minute, is there any performance impact ?
rusha19 commented 6 years ago

This property is not exposed through Ops Manager. Could you share some details around why this is causing issues? There is no impact of having stale policies because the app GUID is invalid.

The performance impact is that we have to go through all the configured policies and query the CC API to validate them.

DennisDenuto commented 6 years ago

FYI, @jessehu you might find this useful. There is an endpoint on the external-policy-server to 'delete stale policies'. See: https://github.com/cloudfoundry/cf-networking-release/blob/develop/src/policy-server/cmd/policy-server/main.go#L239

jessehu commented 6 years ago

Thanks @rusha19 @DennisDenuto. 1) I'm using Network Policy API to retrieve all Network Policies and create corresponding firewall rules for these policies. So I need to create rules the stale(deleted) Policies as well. But I managed to filter out the stale policies with some local validation. 2) The "cleanup" API is useful, together with the clean up interval setting.

rusha19 commented 6 years ago

Hi @jessehu thanks for sharing the use case. Just so I understand - in addition to configure c2c network policy you also need to set up firewall rules, are you configuring an external firewall? All c2c traffic is on an overlay VXLAN network and on the underlay traffic appears to come from the Diego cell. What rules are being configured on the firewall?

The link that @DennisDenuto sent is definitely one way to clean up the policies, however it's not published and the intended use was to clean up policies when an environment is at the scalability limit and cannot allocate any more tags. If there are other uses for it, we would like to understand them and publish some documentation around it.

jessehu commented 6 years ago

Yes, I'm configuring an external firewall to manage the rules for the C2C network policies. 1 Network Policy will map to 1 Firewall Rule. So I think maybe we can detect the org/space/app delete events from CC API in Network Policy Server, then remove the stale policies. Or can we make the default 60 minutes shorter? 60min sounds a long time for the stale policies to stay.

rusha19 commented 6 years ago

Hi @jessehu please let us know if you still have questions around this, or can we close this issue?

jessehu commented 6 years ago

Hi @rusha19 , I'd propose let cf-networking detect the org/space/app delete events from CC API in Network Policy Server, then remove the stale policies immediately. But if this is not accepted, setting a short timeout for cleaning up stale policies is a workaround.

tylerschultz commented 6 years ago

Closing due to inactivity. Re-open if this is still relevant.