Handling host port mappings with CNI (...or how are "arg" conventions documented)

tomdee commented 7 years ago

As a continuation from #46, I wanted to summarize the current position on handling host port mappings in CNI and call out the key questions.

We're close to having plugin chaining through #346 which should allow hostport mapping to be handled by a CNI plugin.

But we still need to decide how the port maps are passed to a plugin. I see three options here:

CNI_ARGS - The CNI_ARGS environment can be used to pass the information to the plugins (e.g. see https://github.com/containernetworking/cni/pull/346#issuecomment-269560403)
args in network config.

args (dictionary): Optional additional arguments provided by the container runtime. For example a dictionary of labels could be passed to CNI plugins by adding them to a labels field under args.

An alternative to the CNI_ARGS environment variable is the args field in the network config JSON.

plugin configuration passed to the plugin. Plugin specific fields can be set in the network config.

IMO, option 2 is the best option. It allows for structured data and doesn't doesn't suffer the problem in 3. of being plugin specific.

Mesos have already written a hostport plugin which uses option 2. But the host port mapping information is in a mesos specific data structure. We should be aiming to establish a convention to allow host port mapping plugins to be shared across multiple runtimes.

So the key problem is how do we establish this convention. The CNI spec does not currently contain this type of information, but maybe it should?

The only prior art that I'm aware of for establishing such a convention is the host-local plugin. It establishes the convention simply by stating it in its documentation.

If we end up with an in-tree hostport plugin, then we could follow the pattern established by host-local but it's not very clear that we will have an in-tree plugin.

So I see the following options for documenting this

a. Update spec.md. Start a new trend of documenting common args in the spec. This is a break from the norm but feels like it matches people's expectations. b. Create a separate doc (outside the spec) to document the common args. These then create some conventions but is more light weight than the spec and has the advantage of not being tied to whether a plugin exists in-tree or not. c. Take an ad-hoc approach, e.g. just note it in the readme or document it in this issue.

IMO, option b is the best option.

tomdee commented 7 years ago

FTR, the mesos plugin uses the following JSON in the args field.

{
  "args" : {
    "org.apache.mesos" : {
      "network_info" : {
        "name" : "mynet",
        "labels" : {
          "labels" : [
            { "key" : "app", "value" : "myapp" },
            { "key" : "env", "value" : "prod" }
          ]
        },
        "port_mappings" : [
          { "host_port" : 8080, "container_port" : 80 },
          { "host_port" : 8081, "container_port" : 443 }
        ]
      }
    }
}

freehan commented 7 years ago

I prefer option 1. Port-forwarding is more runtime specific. Adding multiple containers to a network with/without port-forwarding should not require changes to the network config. In other words, the differences among containers should not affect network config.

IMO, option 2 is the best option. It allows for structured data and doesn't doesn't suffer the problem in 3. of being plugin specific.

Port-forwarding does not require very complicated json structure to represent. Something like this is probably good enough: CNI_ARGS="PORT_MAPPING=8080:80/tcp,8443:443/tcp"

CNI plugin implementation determines whether port-forwarding is supported. If only looking at option 2 and 3, the latter makes more sense to me as it allows plugin to showcase port-forwarding capability explicitly. Plus, we can document a convention for that.

Start a new trend of documenting common args in the spec.

+1 Need to establish a convention for either way we pick. Or spec it.

thockin commented 7 years ago

Putting it in network config implies that it is a property of the network, rather than of the instance. I expect to find network config in a file or similar, and to be static for all containers on that network. Of course, the CNI caller (kubelet or the dockershim) COULD customize the network config by injecting per-instance stuff like this, but there's a clear precedent of CNI_* env vars carrying the per-instance stuff.

So my instinct is to stick this in a CNI_* env var on create/destroy.

As for whether this is a CNI_ARGS or not - if it is an arg, we have nothing but convention binding various plugins to do the same thing. If we actually spec it, but make it optional, then all plugins understand that this is something to pay attention to, and the syntax and semantic of it is unambiguous. I'm soft of this, but that's my gut feeling.

tomdee commented 7 years ago

@thockin @freehan it really isn't the case that network config is static and should be exactly what's on disk. I understand why you have this misconception but I really want to get it across to you that you shouldn't be using CNI_ARGS. The default mechanism for the runtimes to be passing per-innovation arguments to the plugins is through the args field. We maintainers have done a terrible job of communicating this to the world, but that's how it is (ping @containernetworking/cni-maintainers please correct me if I'm wrong)

The main reasons for JSON args over CNI_ARGS is 1) structured data (as I mention above) and 2) versioning. The (ongoing) versioning work has effectively deprecated the CNI_ARGS field as it is not including in the versioning changes.

lxpollitt commented 7 years ago

I agree with @tomdee's point of view regarding CNI_ARGS is probably not the right thing to use. One of the key motivations for adding the args field was to give a safe place for orchestrators to add additional arguments in preference to CNI_ARGS. I'm not against spec'ing a new field meaning, but do think that CNI_ARGS isn't the right place to spec it. Or at least that was my understanding of the conclusion of all the discussions at the time the args field was original proposed and added to the spec.

thockin commented 7 years ago

So why do we have any of the environment variables, then?

On Jan 12, 2017 7:28 PM, "Alex Pollitt" notifications@github.com wrote:

I agree with @tomdee https://github.com/tomdee's point of view regarding CNI_ARGS is probably not the right thing to use. One of the key motivations for adding the args field was to give a safe place for orchestrators to add additional arguments in preference to CNI_ARGS. I'm not against spec'ing a new field meaning, but do think that CNI_ARGS isn't the right place to spec it. Or at least that was my understanding of the conclusion of all the discussions at the time the args field was original proposed and added to the spec.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/containernetworking/cni/issues/351#issuecomment-272352077, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVPt92Qstl6hGpEZ2B--cSTPSixiQks5rRu9egaJpZM4LiVmG .

lxpollitt commented 7 years ago

Yes, indeed! I believe there was a lot of discussion about that by the CNI maintainers at the time the args field was being considered. I think the consensus view was it was a mistake to have spec'ed CNI_ARGS in hindsight and that it should be deprecated.

squeed commented 7 years ago

From my perspective, how to pass arguments to CNI during ADD is not too critical. For example, we could easily add a utility function to libcni that mutates the static configuration stored on disk.

However, the big question is DELETE (and STATUS, if we implement it): are we now forcing runtimes to persist a configuration file per-interface? That does seem unnecessarily complicated.

Perhaps it should be encoded in the spec that args or CNI_ARGS - whichever is chosen - must not be needed for DELETE? I think this makes sense - you don't need to know the bandwidth to delete the qdisc, for example. This would free the runtime from persisting per-interface parameters.

tomdee commented 7 years ago

@squeed good points, but that sounds like it could be an issue with the current spec so would be best tracked as a separate issue. Could you raise one?

freehan commented 7 years ago

However, the big question is DELETE (and STATUS, if we implement it): are we now forcing runtimes to persist a configuration file per-interface? That does seem unnecessarily complicated.

True. Even with STATUS, the runtimes still need to memorize what has been ADDed. Then we are talking about a LIST operation.

Perhaps it should be encoded in the spec that args or CNI_ARGS - whichever is chosen - must not be needed for DELETE? I think this makes sense - you don't need to know the bandwidth to delete the qdisc, for example. This would free the runtime from persisting per-interface parameters.

I think the easy way to do this is only require CNI_CONTAINERID for deletion. Then CNI could avoid having a LIST operation. Each plugin only checkpoints necessary information for DEL, while CNI_CONTAINERID serve as the key. Runtime just need to memorize the ids.

tomdee commented 7 years ago

I just opened PR #352 to prompt discussion on how I thought these conventions could be documented.

freehan commented 7 years ago

To follow up on the last k8s-sig-network meeting:

There are 2 problems with making port-forwarding as a plugin specific field.

If this field is stuck into a random CNI plugin that does not expect this field, the plugin may error out or just ignore it.
It implies that the caller has to know the detail capability of the plugin. This makes it NOT a "plugin".

A few options to solve this: For problem 1 a) put port-forwarding in args or CNI_ARGS b) allow unexpected network config fields to be passed into plugins. I think many existing plugins just ignore fields they do not care. Just need to put this into CNI SPEC.

For problem 2, c) @thockin suggests adding a "port-forwarding" section in the CNI response. Let the plugins return "what they have done". d) add a HELP command to allow runtime to explore plugin's capabilities.

I am leaning towards b) and c). This fits into the picture of plugin chaining. Runtime can just add port-forwarding field to all plugins and examine the result to see "what has been done". The same idea could be applied to other features that CNI plugins support. This requires minimum change to existing plugin. better backward compatibility.

thockin commented 7 years ago

Should we make a new issue or keep going here?

On Fri, Jan 27, 2017 at 2:50 PM, Minhan Xia notifications@github.com wrote:

To follow up on k8s-sig-network meeting:

There are 2 problems with putting port-forwarding as a plugin specific field.

If this field is stuck into a random CNI plugin that does not expect this field, the plugin may error out or just ignore it.

It implies that the caller has to know the detail capability of the plugin. This makes it NOT a "plugin".

A few options to solve this: For problem 1 a) put port-forwarding in args or CNI_ARGS b) allow unexpected network config fields to be passed into plugins. I think many existing plugins will just ignore fields they do not care. Just need to put this into CNI SPEC.

For problem 2, c) @thockin https://github.com/thockin suggests adding a "port-forwarding" section in the CNI response. Let the plugins return "what they have done". d) add a HELP command to allow runtime to explore plugin's capabilities.

I am leaning towards b) and c). This fits in the picture of plugin chaining. Runtime can just add port-forwarding field to all plugins and examine the result to see "what has been done". The same idea could be applied to other features that CNI plugins support. This requires minimum change to existing plugin. better backward compatibility.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/containernetworking/cni/issues/351#issuecomment-275796455, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVJyg6grvO13s3SfKbo6_MFjJvqwQks5rWnS3gaJpZM4LiVmG .

caseydavenport commented 7 years ago

My understanding of the problem here (which may be flawed) is that there is now a "standard" mechanism in the CNI specification for providing port mappings to a CNI plugin, with a well-defined behavior that if the plugin can't service the mappings then it can (and likely should) return an error to the orchestrator. This means that CNI has defined the mechanisms necessary to avoid orchestrator specific port-mapping plugins (i.e Mesos, Kubernetes can use the same standard plugin).

However, the bit that is missing is how the orchestrator knows to which plugin it should pass the port mappings, which feels like it is probably an orchestrator specific problem. I think this probably warrants an issue in the Kubernetes repository for further discussion.

On Fri, Jan 27, 2017 at 5:24 PM Tim Hockin notifications@github.com wrote:

Should we make a new issue or keep going here?

On Fri, Jan 27, 2017 at 2:50 PM, Minhan Xia notifications@github.com wrote:

To follow up on k8s-sig-network meeting:

There are 2 problems with putting port-forwarding as a plugin specific field.

If this field is stuck into a random CNI plugin that does not expect this field, the plugin may error out or just ignore it.

It implies that the caller has to know the detail capability of the plugin. This makes it NOT a "plugin".

A few options to solve this: For problem 1 a) put port-forwarding in args or CNI_ARGS b) allow unexpected network config fields to be passed into plugins. I think many existing plugins will just ignore fields they do not care. Just need to put this into CNI SPEC.

For problem 2, c) @thockin https://github.com/thockin suggests adding a "port-forwarding" section in the CNI response. Let the plugins return "what they have done". d) add a HELP command to allow runtime to explore plugin's capabilities.

I am leaning towards b) and c). This fits in the picture of plugin chaining. Runtime can just add port-forwarding field to all plugins and examine the result to see "what has been done". The same idea could be applied to other features that CNI plugins support. This requires minimum change to existing plugin. better backward compatibility.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/containernetworking/cni/issues/351#issuecomment-275796455 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AFVgVJyg6grvO13s3SfKbo6_MFjJvqwQks5rWnS3gaJpZM4LiVmG

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/containernetworking/cni/issues/351#issuecomment-275816090, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2qHZJyO6zYmJD1R4k-6EC0FRdYYMb3ks5rWpjcgaJpZM4LiVmG .

thockin commented 7 years ago

I am not seeing at all how this is an orchestration problem. This is optional behavior which can hard-fail, but gives me no way of knowing whether it WILL hard-fail without trying. That is an atrocious API.

The only thing I can do about it is try each plugin in the chain with and without these extra args and see what sticks. Is that REALLY the intention? I have a hard time believing that.

I feel like I am missing something. How exactly do CNI maintainers expect this to be used?

On Jan 27, 2017 7:28 PM, "Casey Davenport" notifications@github.com wrote:

My understanding of the problem here (which may be flawed) is that there is now a "standard" mechanism in the CNI specification for providing port mappings to a CNI plugin, with a well-defined behavior that if the plugin can't service the mappings then it can (and likely should) return an error to the orchestrator. This means that CNI has defined the mechanisms necessary to avoid orchestrator specific port-mapping plugins (i.e Mesos, Kubernetes can use the same standard plugin).

However, the bit that is missing is how the orchestrator knows to which plugin it should pass the port mappings, which feels like it is probably an orchestrator specific problem. I think this probably warrants an issue in the Kubernetes repository for further discussion.

On Fri, Jan 27, 2017 at 5:24 PM Tim Hockin notifications@github.com wrote:

Should we make a new issue or keep going here?

On Fri, Jan 27, 2017 at 2:50 PM, Minhan Xia notifications@github.com wrote:

To follow up on k8s-sig-network meeting:

There are 2 problems with putting port-forwarding as a plugin specific field.

If this field is stuck into a random CNI plugin that does not expect this field, the plugin may error out or just ignore it.

It implies that the caller has to know the detail capability of the plugin. This makes it NOT a "plugin".

A few options to solve this: For problem 1 a) put port-forwarding in args or CNI_ARGS b) allow unexpected network config fields to be passed into plugins. I think many existing plugins will just ignore fields they do not care. Just need to put this into CNI SPEC.

For problem 2, c) @thockin https://github.com/thockin suggests adding a "port-forwarding" section in the CNI response. Let the plugins return "what they have done". d) add a HELP command to allow runtime to explore plugin's capabilities.

I am leaning towards b) and c). This fits in the picture of plugin chaining. Runtime can just add port-forwarding field to all plugins and examine the result to see "what has been done". The same idea could be applied to other features that CNI plugins support. This requires minimum change to existing plugin. better backward compatibility.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/containernetworking/cni/issues/351#issuecomment- 275796455 , or mute the thread < https://github.com/notifications/unsubscribe- auth/AFVgVJyg6grvO13s3SfKbo6_MFjJvqwQks5rWnS3gaJpZM4LiVmG

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/containernetworking/cni/issues/351#issuecomment- 275816090, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2qHZJyO6zYmJD1R4k- 6EC0FRdYYMb3ks5rWpjcgaJpZM4LiVmG .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/containernetworking/cni/issues/351#issuecomment-275824140, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVF_9gQswAFwo7We7ZENkt0ZXX5z2ks5rWrXUgaJpZM4LiVmG .

rosenhouse commented 7 years ago

@thockin @freehan I haven't been following the k8s side of this conversation, but I see your point.

A runtime should be able to express an intent for ports to be mapped a certain way, while remaining agnostic to which plugin actually implements that intent.

A runtime would also like to be able to return a meaningful error to the operator if the operator hasn't provided a plugin capable of implementing that intent. Ideally that error could be returned early.

I'd guess that port mappings won't be the last example of this either. There's probably a whole class of plugin "service-discovery" things.

Here's an idea: What if the spec included a short list of well-known names and configuration formats for these intents (services). For example a runtime could express an intent for

{ "port_forwarding": [ { "host": 80, "container": 5000 } ] }

Independently, a plugin might express a capability for port_forwarding, perhaps via its response to a specialized command, e.g. HELP (mentioned above) or OPTIONS (a la HTTP) or the like.

The CNI library layer could verify that at least one plugin in the configuration is capable of executing each of the intents that the runtime desires. It could also arrange for the details of the intent to be passed to that particular plugin when it is invoked, for example by injecting those details into the network config JSON.

Thoughts?

thockin commented 7 years ago

That sort of probing is plausible, but feels heavy to me. Can one rely on the capabilities do not change between runs? What f he user upgrades the plugin live? Do I have to cache mtime or inode and map that to capabilities? I guess that is possible.

On Jan 28, 2017 7:16 PM, "Gabe Rosenhouse" notifications@github.com wrote:

@thockin https://github.com/thockin @freehan https://github.com/freehan I haven't been following the k8s side of this conversation, but I see your point.

A runtime should be able to express an intent for ports to be mapped a certain way, while remaining agnostic to which plugin actually implements that intent.

A runtime would also like to be able to return a meaningful error to the operator if the operator hasn't provided a plugin capable of implementing that intent. Ideally that error could be returned early.

It seems to be that port mappings won't be the last time this type of thing appears. It may be an example of a more general class of plugin "service-discovery" things.

Here's an idea: What if the spec included a short list of well-known names and configuration formats for these intents (services). For example a runtime could express an intent for

{ "port_forwarding": [ { "host": 80, "container": 5000 } ] }

Independently, a plugin might express a capability for port_forwarding, perhaps via its response to a specialized command, e.g. HELP (mentioned above) or OPTIONS (a la HTTP) or the like.

The CNI library layer could verify that at least one plugin in the configuration is capable of executing each of the intents that the runtime desires. It could also arrange for the details of the intent to be passed to that particular plugin when it is invoked, for example by injecting those details into the network config JSON.

Thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/containernetworking/cni/issues/351#issuecomment-275890924, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVGWco_vvAwCQ3myjyb7RvKBXdHrPks5rXASIgaJpZM4LiVmG .

caseydavenport commented 7 years ago

I don't think we necessarily need something as heavyweight as another CNI command or capabilities reporting.

I feel like we should be able to do something relatively simple within the framework of the current spec.

As a quick straw man, what if Kubernetes interprets a field in the args which indicates that it is meant to pass port mappings to this plugin. e.g

{
  "type": "myplugin",
  "name": "mynet",
  "args": {
    "net.alpha.kubernetes.io/provides-hostports": true,
  },
  ...
}

Any config which "requests" this through the args will have the port_mappings field populated when passed to the declared plugin.

thockin commented 7 years ago

This feels really baroque - the user has to know whether the plugin supports a feature - why not just ask the plugin? Having an plugin probe is pretty normal stuff for plugins.

On Sat, Jan 28, 2017 at 8:21 PM, Casey Davenport notifications@github.com wrote:

I don't think we necessarily need something as heavyweight as another CNI command or capabilities reporting.

I feel like we should be able to do something relatively simple within the framework of the current spec.

As a quick straw man, what if Kubernetes interprets a field in the args which indicates that it is meant to pass port mappings to this plugin. e.g

{ "type": "myplugin", "name": "mynet", "args": { "net.alpha.kubernetes.io/provides-hostports": true, }, ... }

Any config which "requests" this through the args will have the port_mappings field populated when passed to the declared plugin.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/containernetworking/cni/issues/351#issuecomment-275893003, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVMG4C1r63at2lLwG9ULfpDscDaFTks5rXBPGgaJpZM4LiVmG .

lxpollitt commented 7 years ago

I think I would like to better understand the requirements driving this discussion before joining with a technical opinion on the solution.

I think there are potentially two user roles being discussed: 1) The application deployer who uses the Kubernetes or Mesos APIs to deploy and manage their microservices / applications. 2) The cluster deployer who is responsible for setting up and maintaining the cluster as infrastructure that the application deployer can use.

For role 1, my guess is the requirements are probably: They don’t need/want to specify anything about the CNI plugins being invoked. If multiple networks are supported (not yet in Kubernetes) they may want to specify which networks a pod should be connected to, but again they don’t want to get into the details of CNI plugins. They do want to specify whether or not a pod needs a host port. If they specify a host port they want their pods to get that host port and not to be left running if they can’t. (Currently I believe this last requirement is not well met when using CNI plugins. I’ve seen plenty of users get confused by the current behavior.)

For role 2, my guess is the requirements are probably: They want to specify exactly which CNI plugins are installed and will be invoked - because they will have chosen a specific network implementation that they believe best suites their needs. Typically they will want to use a single CNI plugin, though that plugin itself may chain on to other CNI plugins if it supports chaining and is so configured (in the same way as some plugins already do today). Depending on the CNI config options the plugin supports they may want to customize the CNI config to suit their particular deployment requirements, or in the case of a chaining plugin, to tell it which plugins it should be chaining together in what order and for what purpose.

Is the above right, or are people hearing other or different requirements that need addressing in this increment?

Go easy on me - I’ve got a fever - but it’s an interesting enough discussion that I wanted to start taking part sooner rather than later.

lxpollitt commented 7 years ago

@tomdee I think my guess of requirements for role 2 is missing consideration of the new CNI chaining by the runtime stuff that was very recently added? Can you comment?

thockin commented 7 years ago

Alex, You're right on except the semantics of chaining, but that's a relatively minor point to this conversation, I think.

But there's actually a third role - the orchestrator itself. It wants to look at some metadata to learn which plugins to call for each new container. The orchestrator must be able to know what features each plugin supports, so it can know whether a plugin or chain of plugins can actually do its job.

On Sun, Jan 29, 2017 at 7:27 PM, Alex Pollitt notifications@github.com wrote:

I think I would like to better understand the requirements driving this discussion before joining with a technical opinion on the solution.

I think there are potentially roles of users being discussed:

The application deployer who uses the Kubernetes or Mesos APIs to deploy and manage their microservices / applications.

The cluster deployer who is responsible for setting up and maintaining the cluster as infrastructure that the application deployer can use.

For role 1, my guess is the requirements are probably: They don’t need/want to specify anything about the CNI plugins being invoked. If multiple networks are supported (not yet in Kubernetes) they may want to specify which networks a pod should be connected to, but again they don’t want to get into the details of CNI plugins. They do want to specify whether or not a pod needs a host port. If they specify a host port they want their pods to get that host port and not to be left running if they can’t. (Currently I believe this last requirement is not well met when using CNI plugins. I’ve seen plenty of users get confused by the current behavior.)

For role 2, my guess is the requirements are probably: They want to specify exactly which CNI plugins are installed and will be invoked - because they will have chosen a specific network implementation that they believe best suites their needs. Typically they will want to use a single CNI plugin, though that plugin itself may chain on to other CNI plugins if it supports chaining and is so configured (in the same way as some plugins already do today). Depending on the CNI config options the plugin supports they may want to customize the CNI config to suit their particular deployment requirements, or in the case of a chaining plugin, to tell it which plugins it should be chaining together in what order and for what purpose.

Is the above right, or are people hearing other or different requirements that need addressing in this increment?

Go easy on me - I’ve got a fever - but it’s an interesting enough discussion that I wanted to start taking part sooner rather than later.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/containernetworking/cni/issues/351#issuecomment-275973252, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVJmtuScD8RDfVWTB14IGsZxCxNj-ks5rXViigaJpZM4LiVmG .

lxpollitt commented 7 years ago

Got you! Thanks Tim.

I know Tim suggests it's a minor point, but I'm wondering if it's the details of the chaining that are the key here. I'm now thinking that was the bit I wasn't grokking.

In the old world where the orchestrator was just calling a single CNI plugin (and that CNI plugin might chain on to other plugins), then I was thinking the orchestrator could safely just add the host port field when a pod requires one. If the CNI plugin supports host ports then it will succeed. If it doesn't then it will fail. The application deployer gets the desired behavior.

But now we have "Network Configuration Lists" defined in the CNI spec that defines metadata for the orchestrator/runtime to consume so it can chain plugins itself. And the issue is that if the orchestrator has been supplied such a list then the list itself does not contain enough metadata for the orchestrator to know which of the plugins it should or should not be adding the host port field to.

Dynamic discovery at runtime sounds great, but a decent chunk of work to design and implement across all plugins and runtimes. And I don't think dynamic discovery is required for either of user roles 1 or 2 to be happy. I'm guessing it would be a lot simpler to just add the missing metadata to the Network Configuration Lists in the CNI spec, and that should meet the requirements of orchestrator/runtime with minimal effort, and zero effort for most CNI plugin implementations.

Pragmatically, my suggestion would be to add some very minimal templating support into Network Configuration Lists as the smallest and most easily achievable change to the CNI spec. We would probably want to update libcni too to be able to make that templating really easy for the orchestrator/runtime. Should be pretty easy in golang... he said naively.

I'm imagining this must have been debated a lot already by the CNI maintainers as the features were being added, so they may have counter views.

thockin commented 7 years ago

I dislike the "its a lot of work, so let's do something awful" argument at this layer (though we use that all the time in Kubernetes). Whatever we decide here sticks for a long time.

I'd rather add an optional "probe" call which can either fail (assume it is a "standard" plugin) or succeed and return info about its capabilities. Since port mapping plugins are all new, it/they can implement this and the rest can just ignore or update on their own schedule.

Orchestrators can map (inode, mtime) or (hash) to capabilities structs.

On Mon, Jan 30, 2017 at 5:38 PM, Alex Pollitt notifications@github.com wrote:

Got you!

I know you suggest it's a minor point, I'm wondering if it's the details of the chaining that are the key here. I'm now thinking that was the bit I wasn't grokking.

In the old world where the orchestrator was just calling a single CNI plugin (and that CNI plugin might chain on to other plugins), then I was thinking the orchestrator could safely just add the host port field when a pod requires one. If the CNI plugin supports host ports then it will succeed. If it doesn't then it will fail. The application deployer gets the desired behavior.

But now we have "Network Configuration Lists" defined in the CNI spec that defines metadata for the orchestrator/runtime to consume so it can chain plugins itself. And the issue is that if the orchestrator has been supplied such a list then the list itself does not contain enough metadata for the orchestrator to know which of the plugins it should or should not be adding the host port field to.

Dynamic discovery at runtime sounds great, but a decent chunk of work to design and implement across all plugins and runtimes. And I don't think dynamic discovery is required for either of user roles 1 or 2 to be happy. I'm guessing it would be a lot simpler to just add the missing metadata to the Network Configuration Lists in the CNI spec, and that should meet the requirements of orchestrator/runtime with minimal effort, and zero effort for most CNI plugin implementations.

Pragmatically, my suggestion would be to add some very minimal templating support into Network Configuration Lists as the smallest and most easily achievable change to the CNI spec. We would probably want to update libcni too to be able to make that templating really easy for the orchestrator/runtime. Should be pretty easy in golang... he said naively.

I'm imagining this must have been debated a lot already by the CNI maintainers as the features were being added, so they may have counter views.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/containernetworking/cni/issues/351#issuecomment-276248156, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVKdnYeefAAEgZreCcP1Z7TVvFbG8ks5rXpB_gaJpZM4LiVmG .

lxpollitt commented 7 years ago

"so let's do something awful" - strong words!

Probing might get interesting if multiple plugins report supporting the same capability. Possibly not a problem for host port (though could be), but my gut says at some point in the future we will want to pass more metadata in as part of the Network Configuration Lists as more capabilities are added to CNI.

Anyway, if the rest of what I said in my previous comment makes sense, then I'm comfortable that I've now got my head around the requirements and pain point, and I have suggested one approach that sounds ok to me. I'm not strongly pushing one way or the other. Just making suggestions for others to comment on.

thockin commented 7 years ago

On Mon, Jan 30, 2017 at 7:17 PM, Alex Pollitt notifications@github.com wrote:

"so let's do something awful" - strong words!

Sorry :) Was meant not so strongly

Probing might get interesting if multiple plugins report supporting the same capability. Possibly not a problem for host port (though could be), but my gut says at some point in the future we will want to pass more metadata in as part of the Network Configuration Lists as more capabilities are added to CNI.

Anyway, if the rest of what I said in my previous comment makes sense, then I'm comfortable that I've now got my head around the requirements and pain point, and I have suggested one approach that sounds ok to me. I'm not strongly pushing one way or the other. Just making suggestions for others to comment on.

Yeah, I just need something so we can proceed. I'll live with whatever, but my preference is known

lxpollitt commented 7 years ago

Like Tim, the main thing I want to see is getting this to the stage we can proceed. @freehan perhaps you could give an indication of when this needs to be resolved to not put your development timeline in jeopardy? I'm guessing pretty soon... This week? Next week? Yesterday?

squeed commented 7 years ago

I'd like to challenge the assumption that the orchestrator cannot modify the runtime configuration file. Indeed, one of the specific motivating examples for this plugin chaining design was that orchestrators could layer additional plugins, e.g. firewalling or port mapping, without having to care about the specifics of the network being created. The CNI_PATH variable supports this - runtimes can provide an ordered list of plugin search paths.

The libcni already contains the tooling for "wrapping" an existing, parsed configuration file with a new execution.

Is there ever a case where Kubernetes (or any other orchestrator) would expect to have port forwarding provided by a separate plugin? Is there another reason you're not comfortable with supplying your own plugins?

bboreham commented 7 years ago

I'm pretty sure we have always wanted to allow for 3rd-party network plugins that are very different from the "norm". So in that case yes we would expect port forwarding to be provided by a 3rd party plugin.

freehan commented 7 years ago

@lxpollitt This is not blocking 1.6 development for now. But we want to reach an agreement to solve the listed problems so that k8s can easily use single or chain of CNI plugins.

tomdee commented 7 years ago

I think we're still not entirely in sync on what we're trying to achieve here and what the best design for it is. But let's try to resolve that!

@thockin, above you say:

This is optional behavior

I'm not sure I agree with this point, and I think @lxpollitt outlines why this shouldn't be optional in this role 1 description above - https://github.com/containernetworking/cni/issues/351#issuecomment-275973252

@thockin, also you say

But there's actually a third role

I'm not sure I really get this. Are you saying that this third role is the collective minds of the kubernetes developers wanting to choose good defaults? I'm trying to understand how it's different from Alex's role 2.

Until I really understand your third role, my current view is that somebody does need to choose the plugins that are being used, and that person does know what capabilities it supports. So, @caseydavenport's suggestion sounds reasonable to me.

I'm definitely not against the idea of probing for capabilities but it feels like such a lot of work that it's not going to all come together in time for Kubernetes 1.6.

thockin commented 7 years ago

@squeed Kubernetes doesn't want to know what underlying network tech is in play, and as such it can not assume that iptables is a correct implementation.

thockin commented 7 years ago

@tomdee

On a per-plugin basis, port-mapping is optional, obviously. Not all plugins implement it. We can argue about whether it is optional on a per-chain basis - ie should kubelet fail to start the pod if no plugin implements it or just log it and keep going.

But there's actually a third role

I'm not sure I really get this. Are you saying that this third role is the collective minds of the kubernetes developers wanting to choose good defaults?

No I mean, the software that is running and executing the plugins. It wants to ingest metadata and comprehend the capabilities of the plugins. Currently the metadata is incapable of expressing "this plugin can handle hostports". The proposal above sticks that into args as a very free-form thing, which feels wrong because we went through all this effort to make hostports NOT part of args.

If we want to go down the path of decorating it explicitly, I think we can live with that, but I'd ask that it be an explicit "capabilities" map or something.

philips commented 7 years ago

@squeed When is iptables not the solution for port forwarding? I see port forwarding as a local concern between CNI configured network namespace and host network namespace with little concern of the network namespace configuration.

I tried to make a concrete example of how chaining would work as proposed here: #363.

rosenhouse commented 7 years ago

When is iptables not the solution for port forwarding?

@philips One example for port forwarding without iptables is OpenVSwitch. Another example might be eBPF. We probably want the runtime to be agnostic about this choice of networking technology.

But even sticking with iptables, one plugin may wish to manage the rules itself -- for example if it already does other iptables-like things and wants to manage all the rules together. But another plugin may not want to think about iptables -- and the operator would need to chain in a specific plugin.

I think we need to support all of these use cases.

thockin commented 7 years ago

Iptables alone is insufficient (it captures traffic but does not prevent a process from binding to that port and silently failing). Iptables doesn't work with some network topologies that do not traverse the root namespace (and in fact HostPorts may not work at all in some forms of this).

On Feb 1, 2017 7:01 AM, "Gabe Rosenhouse" notifications@github.com wrote:

When is iptables not the solution for port forwarding?

@philips https://github.com/philips One example for port forwarding without iptables is OpenVSwitch. Another example might be eBPF. We probably want the runtime to be agnostic about this choice of networking technology.

But even sticking with iptables, one plugin may wish to manage the rules itself -- for example if it already does other iptables-like things and wants to manage all the rules together. But another plugin may not want to think about iptables -- and the operator would need to chain in a specific plugin.

I think we need to support all of these use cases.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/containernetworking/cni/issues/351#issuecomment-276679475, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVBE21qNBFgcMhNd5QwW4YdPZlVmvks5rYJ5DgaJpZM4LiVmG .

freehan commented 7 years ago

I feel like the discussion is focusing on the capability probe solution. Have we rejected the idea to expand CNI result structure to include port-forwarding? (option c in https://github.com/containernetworking/cni/issues/351#issuecomment-275796455)

The current CNI result includes sections for interfaces, ips, routes and dns. Why not add a port_forwardings section? As more features being implemented by CNI plugins, shouldn't CNI response structure expand accordingly?

thockin commented 7 years ago

Putting it in the return assumes that it is OK to get all the way to the end and not have ANYONE handle mapping. I now think that is not a valid assumption. I think either direct probe (exec plugin --probe, parse result) or simply relying on the network config to say "port mapping goes to this plugin" is viable.

On Wed, Feb 1, 2017 at 12:12 PM, Minhan Xia notifications@github.com wrote:

I feel like the discussion is focusing on the capability probe solution. Have we rejected the idea to expand CNI response structure to include port-forwarding? (option c in #351 (comment) https://github.com/containernetworking/cni/issues/351#issuecomment-275796455 )

The current CNI result includes sections for interfaces, ips, routes and dns. Why not add a port_forwardings section? As more features being implemented by CNI plugins, shouldn't CNI response structure expand accordingly?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/containernetworking/cni/issues/351#issuecomment-276768219, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVgVBHYySdA-lLiXmdDLPV6jLZc7hG2ks5rYOcxgaJpZM4LiVmG .

thockin commented 7 years ago

Should this be closed now?

rosenhouse commented 7 years ago

Yep.

containernetworking / cni

Handling host port mappings with CNI (...or how are "arg" conventions documented) #351