CCI-MOC / hil

Hardware Isolation Layer, formerly Hardware as a Service
Apache License 2.0
24 stars 54 forks source link

Use Ansible networking #993

Open naved001 opened 6 years ago

naved001 commented 6 years ago

This was discussed during the RedHat roundtable meeting. This seems like a good idea as this will allow us to support more switches quickly. Though, we should investigate this in detail before moving forward.

It's kinda difficult to find the code related to Ansible networking. Here's what I found for the dell switches: https://github.com/Dell-Networking/ansible-role-dellos-vlan but I still haven't found the code for the modules themselves.

zenhack commented 6 years ago

For posterity, the relevant bit from the summary email after the meeting that I got (was not there myself):

  • Ansible Networking/HIL merge. This seems like an obvious move and we should proceed with it ASAP. In discussion Friday afternoon I heard two interesting points from the Ansible Networking team:
    • It appears that the work to make HIL use Ansible Networking (using Ansible Runner, etc.) will be very similar to the work to make an ML2 driver for Ansible Networking, and we should therefore collaborate closely on that effort;
    • Ansible Networking lacks a thin, stateful API for operators who want to use it for ongoing management of their networks. HIL could become that API and might well gain a lot of upstream uptake if we position it correctly and make some noise about it.
    • Once HIL uses Ansible networking; most of the development challenges for HIL disappear; it becomes a narrow maintainable API.

I'm given to understand that not much detail was discussed. Let's start grounding the exploration with actual requirements, be clear about what we're getting out of this etc. Questions:

  1. Where is the code for the ansible modules we're going to be using (note: modules, not just the roles)? I'm sure it's somewhere, but let's make sure we can actually find it (seems to be not terribly well located). Otherwise this whole thing is a non-starter.
  2. Is all of the ansible networking stuff vendor specific? The above link is for dell switches. I expect the answer to this to be yes, just like it is for system package managers (yum/apt/etc).
    • If so, what do we gain by shelling out to ansible? @naved001 mentioned portability across different versions of dellos if we use the above roles, which seems potentially valuable.
  3. Just to clarify: from what I got from Naved, it sounded like we were just talking about adding an ansible-backed switch driver (or maybe a few, with common code, per (2)), to HIL; was that everyone else's understanding?

I'm going to link to this issue from the mailing list thread, so we can get the technical discussion/work rolling in the usual channels.

SahilTikale commented 6 years ago

Let me add few points here from what I gathered by talking to Red Hat folks informally during the day.

  1. Ideally there would be one driver The ansible driver that will offload all the switching complexity to Ansible. Nobody said this explicitly, but it felt like that.

  2. Ansible is an automation tool, which takes a layered approach. Most of these layers tries to make switch management user friendly, human readable and standard across different switches. The heavy lifting (akeen to our drivers) is done by the ansible-modules. Last summer when I was exploring ansible, my primary thought was that HIL would be interfacing with these ansible-modules but that may not be the case as I will explain in the next point.

  3. In case of switch management, it roughly follows a 3 layer approach. -- Input in json --> input to a middle layer that converts it to switch specific inputs --> switch specific modules act on the switch, make necessary changes or gather required info --> it is parsed by the middle layer and converted into json --> output in json -- There is another layer called ansible-runner that sits above all these 3 layers. I gathered from one of the engineers that they are trying to interface Neutron (ML2) with ansible at the level of ansible-runner.

Related to comments made by @zenhack:

  1. My experience with ansible modules has been as @zenhack described, I agree that ansible modules are difficult to locate, we will need help of the experts on that. But as mentioned above, their mental model suggests we will not have to deal with these modules in the first place. So let's talk more, flesh out exact requirements and then see what code refactoring is required on both sides.

  2. In their talk they mentioned that they have more than 47 modules for lot of platforms. Platform means switch OS like ios(Cisco), junos(Juniper) etc. More are in development.

  3. My impression is that it is more than just talks, MOC folks and REDHAT have found common use cases where it makes sense to move forward with this project (HIL with ansible) since Ansible folks are naturally inclined towards having a microservice and HIL is exactly that. Also they are starting work on adding ansible support to Neutron (ML2) and thus would benefit from similar efforts done for HIL.

I hope this helps.

naved001 commented 6 years ago

Ideally there would be one driver The ansible driver that will offload all the switching complexity to Ansible.

That's how I felt like too.

But as mentioned above, their mental model suggests we will not have to deal with these modules in the first place.

We should still have visibility and an understanding of those modules if we'll be relying for network isolation on Ansible Networking drivers.

In their talk they mentioned that they have more than 47 modules for lot of platforms. Platform means switch OS like ios(Cisco), junos(Juniper) etc. More are in development.

To add to this, they also have platform agnostic modules.

Other than the points @SahilTikale mentioned, we also get to build a community around HIL.

I think the main thing that's blocking us right now is finding the roles and modules of some switch and then we can start working with it.

zenhack commented 6 years ago

Ok. I'm not necessarily opposed to replacing the driver infrastructure whole-sale, but I have a couple concerns:

  1. I worry a little bit about performance; my experiences with ansible have been that it's pretty slow -- very much a "hit enter, go get coffee" thing. The fact-collection step usually takes longer than the entirety of a call to apply_networking(), If there are only a couple actions in the queue. Right now most of the time in HIL is dominated by a completely arbitrary cll to sleep(), that wouldn't be terribly difficult to remove. It may not bad enough to be an issue, but I worry.
  2. The architecture being proposed feels a little upside down to me. My understanding is that we're talking about a call graph like HIL driver --> ansible-playbook --> python library for talking to switches (the library being the module, since those are all written in python). It's not clear to me what ansible is doing in this picture besides being an arbitrary middle-man.

What I think definitely makes sense is to re-use/collaborate on the python code for the switches. But I wonder if an architecture like this makes more sense:

HIL driver ----+                                +----> Cisco backend
               |                                |
ML2 driver ----+----> Common python library ----+----> Dell backend
               |                                |
Ansible role --+                                +----> Brocade backend

I'm actually really not a huge fan of how our driver infrastructure is set up right now, and wouldn't mind swapping it out for something else. But I'm not convinced going through ansible proper actually makes technical sense.

A lot of this depends on what the existing modules look like. I'm going to ping the email thread about this, in case the folks who know where the code is aren't reading this.

zenhack commented 6 years ago

Actually, I think I found them:

https://github.com/ansible/ansible/tree/devel/lib/ansible/modules/network

zenhack commented 6 years ago

And, for reference, here is the role documentation:

http://docs.ansible.com/ansible/latest/modules/list_of_network_modules.html

Picking through it may provide some more insight into the interfaces we're looking at.

privateip commented 6 years ago

@zenhack I will try to fill in some of the blanks here. One of the advantages of using Ansible is that we can completely decouple the orchestration interface from the low level driver. Ansible affords us the opportunity to define the API contract between HIL and Ansible and then Ansible can invoke the appropriate device specific implementation. Because the device specific implementation is essentially Ansible tasks, we are not restricted to having to build device drivers that have to conform to a specific super class. In fact, device "drivers" don't even have to be written in Python since Ansible can execute modules written in any language. (In all fairness though, Python by far is the most common and quickest way to get started).

Additionally, Ansible is able to map the input data structure to any output, meaning that connection to the device can be easily changed out without having to modify anything upstream. So moving from cli to netconf to grpc (or whatever path may be chosen) becomes a routine exercise. And since we are invoking tasks through the modules, "drivers" can be developed and updated independently and provided easily by vendors that may or may not ever have an understanding of the particular application they are being used for.

With regards to fact collection, there is a significant amount of flexibility here that can be used to optimize the implementation. Facts try to run with a sane set of defaults but there is quite a bit of room for optimizing (or even eliminating) that process if needed.

I would amend your call graph slightly to follow this paradigm:

HIL driver --> ansible-playboook --> ansible-role --> ansible-tasks --> python module

I think the key is the role usage and the benefits it brings. All this said, I think the right starting point is not looking at the modules available but to start with what is the functions that Ansible needs to provide and what is the data structures that will be provided as input to perfrom said functions. From there the path can be determined as to the right next steps.

privateip commented 6 years ago

From questions posed by @zenhack:

  1. Where is the code for the ansible modules we're going to be using (note: modules, not just the roles)? I'm sure it's somewhere, but let's make sure we can actually find it (seems to be not terribly well located). Otherwise this whole thing is a non-starter.

Getting to back to this to answer the more specific questions. All modules that ship with Ansible are available in the modules/ tree. But in this case, I think roles is the much better approach. Use the functional modules provided by Ansible to interface with the network devices and build roles that focus translated platform agnostic requests into platform specific implementations.

  1. Is all of the ansible networking stuff vendor specific? The above link is for dell switches. I expect the answer to this to be yes, just like it is for system package managers (yum/apt/etc). If so, what do we gain by shelling out to ansible? @naved001 mentioned portability across different versions of dellos if we use the above roles, which seems potentially valuable.

This is a bit of an "it depends" answer. There is, currently, a general slant to modules being platform (or vendor if you will) specific. However, as the transition moved from 2.5 development to the current set of sprints, much of the platform specific code has been push "down the stack" into plugins that allow for building platform agnostic instantiations of modules.

To put that in a more real world use case, instead of having things like ios_config, eos_config, junos_config, etc, Ansible now supports an architecture that can use a platform agnostic module such as cli_config and let the plugin system handle the platform specific encoding.

  1. Just to clarify: from what I got from Naved, it sounded like we were just talking about adding an ansible-backed switch driver (or maybe a few, with common code, per (2)), to HIL; was that everyone else's understanding?

I cannot offer much here yet as I'm still digging into the HIL architecture and code base to understand it better. I will have more thoughts to offer once I finish going through the architecture in more detail.

zenhack commented 6 years ago

This was briefly raised on the email thread, but we didn't really end up following up on it this week: @privateip suggested a more synchronous discussion (maybe via IRC; we could use #moc on freenode) might make sense, and I agree. do we want to pin down a time for this? Mornings are generally bad for me and Tuesday this week is booked, but I can block off time most afternoons given a bit of notice.

naved001 commented 6 years ago

yeah, afternoons work for me too given a bit of a notice.

privateip commented 6 years ago

Agreed lets pin something down. I will toss out Thur 2pm EDT as an option

zenhack commented 6 years ago

Works for me.

naved001 commented 6 years ago

Works for me too.

cc: @knikolla @SahilTikale @okrieg @pjd-nu @Izhmash

privateip commented 6 years ago

Confirmed then, Thurs 2pm EDT #moc on Freenode... see you there

zenhack commented 6 years ago

Cool, see you all there.

Quoting Peter Sprygada (2018-04-22 23:29:56)

Confirmed then, Thurs 2pm EDT #moc on Freenode... see you there

zenhack commented 6 years ago

Channel logs from the meeting, for posterity:

(14:01:47) privateip: hi everyone
(14:04:01) naved1: Hi
(14:04:19) naved1: how's everyone doing?
(14:04:30) isd: hello
(14:04:43) isd: (for those unaware: I'm zenhack on github)
(14:05:44) isd: So we were going to talk about the ansible networking stuff.
(14:05:52) naved1: So where do we wanna start from? I had some questions about it
(14:06:06) privateip: yeah that sounds good
(14:06:15) privateip: perhaps just start with your questions
(14:06:16) isd: Are folks ok with me posting the logs to the github issue afterwards?
(14:06:18) privateip: and we can take it from there
(14:06:24) privateip: +1
(14:06:27) naved1: +1
(14:06:37) isd: cool
(14:06:54) naved1: so one of the things I noticed is that ansible API is not meant for general use, and it's not stable
(14:07:00) naved1: is there a plan to make it stable
(14:07:09) naved1: or do we just write wrappers around the ansible cli?
(14:07:14) privateip: yes but not in the way you are probably thinking
(14:07:37) privateip: the ansible project has always maintained that the API in ansible is by and large internal
(14:07:43) privateip: so it gets changed frequently
(14:07:45) privateip: however
(14:07:56) privateip: we have started a new project to address this
(14:07:58) privateip: https://github.com/ansible/ansible-runner
(14:08:20) privateip: ansible-runner will be a stable wrapper around the ansible executables to provide a consistent, programable api
(14:08:43) naved1: okay
(14:09:18) privateip: not only does it solve the api problem, but it also solves the need to have things gpl to integrate with ansible core
(14:09:53) privateip: the project is obviously early on and community driven so we have maximium flexibility to influence its direction
(14:10:59) privateip: its the same project we are using to integrate ansible with ironic in the BMaaS work
(14:11:38) naved1: okay, ill see how that works
(14:11:46) naved1: @isd, what do you think?
(14:12:03) isd: ansible-runner sounds fine.
(14:12:13) privateip: i can post some examples in my repo and link them back to the issue to help get it jump started
(14:12:30) naved1: that would be great
(14:12:31) naved1: thanks!
(14:12:58) isd: So, my thinking on how to go about this is to start by writing a switch driver for HIL that leverages ansible to actually manage the switch. There's talk about just replacing stuff whole-sale, but I don't think jumping straight to that would actually simplify anything vs. just writing a driver.
(14:13:18) privateip: yeah i would agree
(14:13:24) naved1: yeah, I am gonna start working on it right away and use ansible-runner
(14:13:25) isd: That will provide a much more clear picture of how this would actually work.
(14:13:41) isd: sounds good to me.
(14:13:47) privateip: i wonder could it be so easy as to consider ansible as a switch?
(14:13:58) privateip: in that we write a HIL driver for ansible
(14:14:00) isd: That was more or less my notion
(14:14:03) naved1: How about I rewrite one of our drivers so that we can compare the performance too.
(14:14:10) isd: That would be useful
(14:15:18) naved1: cool. I'll get start working on the code for this.
(14:15:38) privateip: how does that hand off look today?  from orchestration to driver
(14:15:57) isd: privateip: https://github.com/CCI-MOC/hil/blob/master/hil/model.py#L199
(14:15:59) privateip: is there a standard message(s) that gets passed or is it unique to each driver implementation?
(14:16:06) isd: A new switch driver would be a subclass of that.
(14:16:14) privateip: ahhhh
(14:16:20) privateip: thats what i was looking for
(14:16:52) isd: There's some shared code for a few of our drivers that have similar implementations, e.g. `hil/ext/switches/_console.py` manages some common logic for drivers that talk to the console.
(14:17:07) isd: But that's the interface expected of switch drivers.
(14:19:05) isd: There's also this notion of a "channel," which was an abstraction we designed around the idea that at some point we'd want to be able to do the isolation with technologies other than vlans, e.g. vxlans or infiniband stuff. That hasn't actually happened yet, but you'll see references to that in various places.
(14:19:37) isd: There's another class that you can subclass to provide one of those.
(14:20:32) isd: In retrospect it might have been a bit over-engineered. But having it in place also hasn't been much of a development burden.
(14:20:34) ***isd shrugs
(14:20:56) privateip: yep been there before
(14:21:12) privateip: how does HIL authenticate?
(14:21:17) privateip: to the device that is
(14:21:37) isd: Driver specific.
(14:21:59) isd: There's a field in the call to register switches that allows them to take device-specific info; this is where those credentials go
(14:22:16) isd: docs/rest_api.md details the api.
(14:22:37) isd: hil/api.py implements it, and you can kinda follow the call graph from there.
(14:22:54) naved1: https://github.com/CCI-MOC/hil/blob/master/docs/network-drivers.md this one has details about each supported switch.
(14:24:44) privateip: those are very helpful ... thanks
(14:25:07) privateip: so i think once you see how to use ansible runner to interface with ansible ...writing a driver should be really quick
(14:25:16) naved1: yes
(14:25:20) isd: Yeah, I expect so.
(14:25:27) privateip: the bigger errort, imho, will be defining what ansible does with that information
(14:25:53) privateip: essentially you will pass a json structure to runner that represents a playbook
(14:26:18) privateip: but i envision the playbook almost nothing except all a role
(14:26:27) privateip: and all the "logic" will be written into the role
(14:26:54) isd: yeah, that was my expectation as well.
(14:26:55) privateip: this will allow ansible to take the input data, call the right device implementation and then go configure the device
(14:27:26) privateip: are you collecting ephemeral state from the network devices or simply looking for a success/fail flag on the driver today?
(14:27:37) isd: the latter.
(14:28:07) isd: there's some logic for querying info, but it's only used in the test suite.
(14:29:20) privateip: what other questions about the ansible side do you have?
(14:29:52) naved1: I was looking around the dellno9 module here http://docs.ansible.com/ansible/latest/modules/dellos9_config_module.html#dellos9-config-module
(14:30:35) naved1: where do i see what different ways I can connect to the switch?
(14:31:03) naved1: besides network_cli, I have seen switches running dellnos9 have an API.
(14:31:45) naved1: so would prefer to use the API (netconf or whatever).
(14:31:53) privateip: so today we have connection plugins for network_cli, netconf and api (although only eapi and nxapi have been implemented)
(14:32:09) privateip: we are currently building grpc for ansible 2.6
(14:32:30) privateip: taking the platform specific modules away for just a moment
(14:32:48) privateip: we have generic cli config and netconf conf modules that are totally abstracted from the device implementation
(14:33:29) privateip: then we just released some new lookup plugins that allow us to convert from key/value (json) to config and back using a set of ansible-like directives
(14:33:56) privateip: this would be the better route for this implementation, imho
(14:34:21) naved1: okay, that answers my question. isd: you have any other questions?
(14:34:41) privateip: fyi---> https://github.com/ansible-network/network-engine
(14:34:56) privateip: this is where the new parsing and template stuff currently lives ^^
(14:35:52) privateip: it effectively allows us to "codify" the configuration file into ansible tasks
(14:36:12) isd: privateip: you mentioned that there was some support for tweaking how fact collection works?
(14:36:19) privateip: yes
(14:36:32) isd: I wouldn't mind a pointer to relevant documentation
(14:36:34) privateip: what specifically did you have in mind?
(14:37:15) isd: It may not even be necessary; we'll know more once we have a working driver. For now I'm just vaguely curious
(14:37:42) privateip: ah got it ... so there are two parts to this (looking for the docs link now) ....
(14:38:08) privateip: one part is having more control over the facts collection subsystem to collect facts from network devices instead of the default collection from linux based hosts
(14:38:34) privateip: basically this is calling a different module when gather_facts=true in the playbook instead of the default setup module
(14:39:01) privateip: i think the more important one though for networking will be the new parser as outline above in network-engine
(14:39:15) privateip: this will allow us to collect any show command and transform it into ansible facts
(14:39:42) privateip: so we can build fact collection roles that collect precisely what we need
(14:40:10) privateip: and we can even normalize (to a degree) within the role if we chose to create consistency
(14:40:45) isd: We'll actually probably want to use something like that for the testing bits of the drivers: https://github.com/CCI-MOC/hil/blob/master/hil/model.py#L296
(14:41:35) privateip: ah yes, makes perfect sesnse
(14:41:39) privateip: sense*
(14:42:52) privateip: ok so sounds like once we get driver <--> runnner wired up we can really start playing this out
(14:43:18) isd: The other concern was just being able to pair stuff down, as I have some vague worries about performance using this in an interactive imperative context. But it doesn't make sense to worry too much about that until we've actually seen how the driver performs -- I suspect it will be fine.
(14:43:42) privateip: yep agreed
(14:44:03) isd: Yeah.
(14:44:15) isd: I think that answers all of my questions.
(14:44:25) naved1: iballou: you have something you want to ask?
(14:44:28) privateip: i will get the runner instructions and examples posted and linked back to the GH issue hopefully today yet but for sure by tomorrow
(14:44:38) isd: naved1: you're planning on taking the lead implementing the first driver?
(14:44:39) naved1: privateip: you have any other questions for us?
(14:44:44) naved1: yes
(14:44:49) isd: sounds good.
(14:44:54) privateip: i think i have most of mine answered
(14:45:14) privateip: i will most likely have more as we move forward of course
(14:45:28) privateip: but this seems to be a good start
(14:45:47) isd: Cool. Sounds like we're done then.
(14:45:51) naved1: yeah, that sounds good. I'll probably have some more questions when I start writing the code.
(14:45:56) naved1: thanks for your time, privateip!
(14:46:08) privateip: i will be out west for the next two weeks but once we get the driver <--> runner prototype built it might make sense to follow up with hacking day in Boston
(14:46:18) privateip: thanks all.... appreciate the time
(14:46:26) isd: Likewise.
privateip commented 6 years ago

@naved001 @zenhack example implementation of using ansible-runner as an api posted here

radez commented 6 years ago

We are still POCing the code for the ML2 driver. Here's where we've gotten to: https://github.com/radez/networking-ansible fyi working on moving it to https://github.com/openstack/networking-ansible so if my github account link is broken get it from openstack.

This is the ironic BMaas stuff referenced above