Hierarchy on the file system.

word commented 8 years ago

In Hiera it's possible to specify the configuration in a hierarchy of directories on the file system, like so:

common.yaml
locations/dc1/common.yaml
locations/dc1/hosts/hosta.yaml
locations/dc1/hosts/hostb.yaml
locations/dc2/common.yaml
locations/dc2/hosts/hostc.yaml
locations/dc2/hosts/hostd.yaml

So far in pepa the closest I can arrive at is:

default/default.yaml
locations/dc1.yaml
locations/dc2.yaml
hosts/hosta.yaml
hosts/hostb.yaml
hosts/hostc.yaml
hosts/hostd.yaml

However, it's not immediately obvious from the layout what location a given host belongs to. This gets more complex when you have a lot of hosts and more steps in the sequence (environments etc.)

Is a similar approach to the first example possible with pepa or is there another, pepa specific, way to handle this?

ake-persson commented 8 years ago

What is it you're trying to map?

As I can understand from your example you want to map:

Default / Location / Host

Right?

So please provide some context and exactly what you're trying to map I can provide you with an example.

But it will be later Today or Tmrw.

ake-persson commented 8 years ago

@smithjm can also provide feedback on this.

ake-persson commented 8 years ago

Here is a stripped down version of something we actually use:

ext_pillar:
  - pepa:
      resource: host
      sequence:
        - default:
        - hostname:
            name: input
            base_only: True
        - location:
        - environment:
        - roles:
        - hostname:
            name: override
            base_only: True
      subkey: True

pepa_delimiter: ..
pepa_roots:
  base: /srv/pepa/base
  dev: /srv/pepa/dev
  qa: /srv/pepa/qa
  prod: /srv/pepa/prod

ake-persson commented 8 years ago

Here is an example hierarchy on disk:

host/
├── location
│   ├── emea-nl-1.yaml
│   ├── amer-us-1.yaml
├── default
│   ├── default.yaml
├── input
│   ├── myhost_example_com.yaml
├── override
│   ├── myhost_example_com.yaml
├── roles
│   ├── apache_server.yaml

ake-persson commented 8 years ago

Hope this helps?

ake-persson commented 8 years ago

This will be resolved in a hierarchy based on your sequence, als any with base_only won't care about environment but only fetch from base this is useful for config. you don't want to stage.

ake-persson commented 8 years ago

Either you have to provide input for Pepa from an ext. source like a database and chain the ext. pillars which is what we do. Or you can define defaults for a host in Pepa in the first input layer.

ake-persson commented 8 years ago

Assument that hosta has location b then when locations are run it will import b.yaml. Sequence will be executed in sequence and take into account any info defined in prev. templates.

ake-persson commented 8 years ago

If you could give me some inputs for the hosts and locations I could make an example for you, that would map to the problem you're trying to solve.

word commented 8 years ago

Thanks for the quick response, it's much appreciated.

I think a lot of my confusion stems from expecting Pepa to behave similarly to Hiera and it's clearly quite different.

I was planning to not use a sub-key and have all config stored in a single hierarchy. So a simplified example would look like this:

infrastructure defaults

default/default.yaml

nameserver: 8.8.8.8

location defaults

locations/dc1.yaml

nameserver: 4.4.4.4

locations/dc2.yaml

nameserver: 3.3.3.3

environment defaults

envs/staging.yaml

apache_vhost: staging.example.com

envs/prod.yaml

apache_vhost: prod.example.com

hosts hosts/hosta_example_com.yaml

ip_addr: 10.10.10.11

hosts/hostb_.example_com.yaml

ip_addr: 10.10.20.1

hosts/hostc_example_com.yaml

ip_addr: 10.10.10.12

pepa sequence

sequence:
   - default:
   - location:
   - env:
   - host:

Lets say hosta.example.com is in 'dc1' location and 'staging' env, it's final config would look like this:

nameserver: 4.4.4.4
apache_vhost:  staging.example.com
ip_addr: 10.10.10.11

hostb .example.com in the same location and 'prod' env:

nameserver: 4.4.4.4
apache_vhost: prod.example.com
ip_addr: 10.10.20.1

So far so good. but there are a couple of issues.

First, just by looking at the config it's hard to see what the final configuration of a host would be. You have to look up the location and env grains for a particular host to find out where it belongs.

Then, what if I want to have a 'staging' and 'prod' environments under the 'dc2' location as well? (with different apache_vhost keys for instance). It won't work because 'dc2' hosts would pick up the same env config as 'dc1' hosts. I could create a unique env config like: prod-dc2.yum but it doesn't feel quite right.

Just to clarify by 'env' i don't mean an environment in salt terminology (separate state tree). In my case the state tree is the same it's just the configuration data that varies. env is a grain assigned to the node, same as location.

Also I'm having trouble understanding this part of your example:

├── input
│   ├── myhost_example_com.yaml
├── override
│   ├── myhost_example_com.yaml

What's the reason for specifying the host configuration and the overriding it later?

Many thanks.

smithjm commented 8 years ago

I can answer this:

Sometimes you want host info to define what is done in default, by region, etc. (e.g. giving a host a role or attribute that triggers a partiuclar template) in which case you would use input, which is applied first. Other times you want to override defaults with something different for a particular host, in which case you would use "override" which is applied last.

I have hosts which have some values set in input, and one or two "overrided" values set in override.

On Fri, Nov 20, 2015 at 9:19 AM, Andrew Wasilczuk notifications@github.com wrote:

Thanks for the quick response, it's much appreciated.

I think a lot of my confusion stems from expecting Pepa to behave similarly to Hiera and it's clearly quite different.

I was planning to not use a sub-key and have all config stored in a single hierarchy. So a simplified example would look like this:

infrastructure defaults

default/default.yaml

nameserver: 8.8.8.8

location defaults

locations/dc1.yaml

nameserver: 4.4.4.4

locations/dc2.yaml

nameserver: 3.3.3.3

environment defaults

envs/staging.yaml

apache_vhost: staging.example.com

envs/prod.yaml

apache_vhost: prod.example.com

hosts hosts/hosta_example_com.yaml

ip_addr: 10.10.10.11

hosts/hostb_.example_com.yaml

ip_addr: 10.10.20.1

hosts/hostc_example_com.yaml

ip_addr: 10.10.10.12

pepa sequence

sequence:

default:

location:

env:

host:

Lets say hosta.example.com is in 'dc1' location and 'staging' env, it's final config would look like this:

nameserver: 4.4.4.4 apache_vhost: staging.example.com ip_addr: 10.10.10.11

hostb .example.com in the same location and 'prod' env:

nameserver: 4.4.4.4 apache_vhost: prod.example.com ip_addr: 10.10.20.1

So far so good. but there are a couple of issues.

First, just by looking at the config it's hard to see what the final configuration of a host would be. You have lot look up the location and env grains for a particular host to find out where it belongs.

Then, what if I want to have a 'staging' and 'prod' environments under the 'dc2' location as well? (with different apache_vhost keys for instance). It won't work because 'dc2' hosts would pick up the same env config as 'dc1' hosts. I could create a unique env config like: prod-dc2.yum but it doesn't feel quite right.

Just to clarify by 'env' i don't mean an environment in salt terminology (separate state tree). In my case the state tree is the same it's just the configuration data that varies. env is a grain assigned to the node, same as location.

Also I'm having trouble understanding this part of your example:

├── input │ ├── myhost_example_com.yaml ├── override │ ├── myhost_example_com.yaml

What's the reason for specifying the host configuration and the overriding it later?

Many thanks.

— Reply to this email directly or view it on GitHub https://github.com/mickep76/pepa/issues/8#issuecomment-158429611.

word commented 8 years ago

@smithjm thanks for the reply. So 'input' basically classifies the node similar to what usually happens in top.sls. Have I understood that right?

smithjm commented 8 years ago

More or less. We classify nodes in top.sls by environment, which corresponds to stage in the CICD pipeline (e.g. dev/qa/prod), but use pepa inputs to define the type of host (ceph node, openstack compute node, server type xxxx, etc.). We go a step further, and map 'role' pillars to state names directly, and 'attribute' pillars to cluster definitions or host groups.

For example, a ceph node would have a pepa defined role of 'ceph.node', and an attribute of 'chicago_ceph_cluster'. The former makes sure the state ceph.node is called, while the latter groups all the pillar definitions specific to the cluster into one pepa template (attributes/chicago_ceph_cluster.yaml).

host/input/mycephnode.yaml: attributes:

chicago_ceph_cluster roles..merge():
ceph.node

our top file makes sure any role's state will be run (but this means you have to define a salt formula/state for any role you use, otherwise use attributes to group pillar values instead)

states/top.sls: dev: 'I@environment:dev':

match: compound {%- for role in salt['pillar.get']('roles', [] %}
{{ role }} {%- endfor %} qa: "the same but for environment == qa" prod: "the same but for environment == prod"

NOTE that we use Pepa defined PILLARS for environment, etc. rather than grains, hence I@environment, but you could use grains (G@environment) or something else instead.

You don't have to do it this way, it is simply what we found worked the best in our environment.

Pepa is so flexible you can really use whatever substitution hierarchy you want, and structure your host groupings and definitions however you like.

Regards,

Jean

On Fri, Nov 20, 2015 at 12:17 PM, Andrew Wasilczuk <notifications@github.com

wrote:

@smithjm https://github.com/smithjm thanks for the reply. So 'input' basically classifies the node similar to what usually happens in top.sls. Have I understood that right?

— Reply to this email directly or view it on GitHub https://github.com/mickep76/pepa/issues/8#issuecomment-158480236.

word commented 8 years ago

Thanks Jean, that makes sense and it's great to see an example pattern.

Going back the original question. Given that pepa had some influence from Hiera, is there a reason you chose a flat layout on the file system rather than a nested hierarchy? If not, is this something you'd consider as a potential feature?

The configuration I described earlier could be laid your like this in a hierarchy:

base/default/default.yaml
base/location/dc1/default.yaml
base/location/dc1/environments/prod/hosta_example_com.yaml
base/location/dc1/environments/stag/hostb_example_com.yaml
base/location/dc2/default.yaml
base/location/dc2/environments/prod/hosta_example_com.yaml
base/location/dc2/environments/stag/hostb_example_com.yaml

It makes it quite easy to understand the data structure (as most folk understand file system hierarchies) and solves the problem of having 'prod' and 'stag' environments in both locations.

The sequence/hierarchy could be defined like this:

ext_pillar:
  - pepa:
    - sequence:
      - "default"
      - "{{hostname}}"
      - "environments/{{env}}/default"
      - "environments/{{env}}/{{hostname}}"
      - "locations/{{location}}/default"
      - "locations/{{location}}/{{hostname}}"
      - "locations/environments/{{env}}/default"
      - "locations/environments/{{env}}/{{hostname}}"

The values nearer the bottom override the values defined on top.

ake-persson commented 8 years ago

I understand the reasoning here however it doesn't map cleanly to a Git checkout and using Git for staging. That is the main reason why each environment is first in the structure.

ake-persson commented 8 years ago

A hierarchical file structure is more confusing here since it implies a relationship between different categories in the sequence where there is none.

The only implicit relationship between categories is the order in which they are evaluated for compiling the templating and substitution.

So for example:

/host/input/a.yaml
---
location: b

/location/b.yaml
---
tenant: c

/tenant/c.yaml
---

So each level can re-define which successive template get's compiled.

Please correct me if I misunderstood your example?

Hiera was not the inspiration for Pepa it's a continuation of Distill which was a similar product for Puppet which was inspired by some in-house tools that follows the pattern of using substitution.

There is also a clear distinction between Pepa and Salt-Stack, Pepa templates each file in order of sequence and retains the values through each iteration as input to the next where as Salt-Stack compiles all templates in one go.

There is also another clear distinction here it's that we separate Configuration from Code and test it separately. This means we can minimize code changes, that normally requires more rigorous testing.

ake-persson commented 8 years ago

Also I think the pattern of substitution in the configuration file won't work since it implies the server already knows your host environment and location.

ext_pillar:
  - pepa:
    - sequence:
      - "default"
      - "{{hostname}}"
      - "environments/{{env}}/default"
      - "environments/{{env}}/{{hostname}}"
      - "locations/{{location}}/default"
      - "locations/{{location}}/{{hostname}}"
      - "locations/environments/{{env}}/default"
      - "locations/environments/{{env}}/{{hostname}}"

ake-persson commented 8 years ago

I'm willing to extend the Code if you can make a good argument as to why and how it would work.

word commented 8 years ago

I must clarify that by 'env' in my examples i mean something different to the salt notion of environment. I should probably come up with a better term to avoid confusion. For example when I deploy infrastructure for say, a web app. It'll usually have multiple environments (from the app developers point of view). However, as far as salt is concerned it's all running within the same 'base' environment. Using the same states. The only difference is the configuration. For example, the webapp in dev will generally run under a different sub-domain, it'll have disabled caching etc. but that's all just data.

Perhaps this example is a bit clearer (without environments):

default
└── default.yaml
projects
├── facebook
│   ├── common.yaml
│   ├── eu-west-1
│   │   ├── appserver_facebook_com.yaml
│   │   ├── common.yaml
│   │   ├── dbserver_facebook_com.yaml
│   │   └── logserver_facebook_com.yaml
│   └── us-east-1
│       ├── appserver_facebook_com.yaml
│       ├── common.yaml
│       ├── dbserver_facebook_com.yaml
│       └── logserver_facebook_com.yaml
└── google
    ├── common.yaml
    ├── eu-west-1
    │   ├── appserver_google_com.yaml
    │   ├── common.yaml
    │   ├── dbserver_google_com.yaml
    │   └── logserver_google_com.yaml
    └── us-east-1
        ├── appserver_google_com.yaml
        ├── common.yaml
        ├── dbserver_google_com.yaml
        └── logserver_google_com.yaml
regions
├── eu-west-1.yaml
└── us-east-1.yaml

the hierarchy config:

      - "default"
      - regions/{{aws_region}}
      - "{{projectname}}/common"
      - "{{projectname}}/{{aws_region}}/common"
      - "{{projectname}}/{{aws_region}}/{{hostname}}"

So:

There are infrastructure wide defaults at the top of the hierarchy.
- We have common setting for each aws region (such as NTP servers). Those will be shared by both projects, unless individual project overrides them further down the hierarchy.
- Each project (google, facebook.) has it's common settings (such as top level domain).
  - Then there are common setting for each aws region specific to the project. For example: facebook at eu-west-1 might want to use different package repositories than google in the same region (but different in us-east-1)
  - Finally we have the hosts which will define host unique stuff like IP addresses and inherit/merge/override data from above (if defined).

I generally classify nodes using grains so I don't have the need for the 'input' directory like in your infrastructure. For example the grains I'd set for the facebook app server running in eu-west-1 would be:

grains: 
  aws_region: eu-west-1
  projectname: facebook
  roles:
    - appserver

The grains are created automatically during the node bootstrap process. In case of AWS that would be a user-data script that does the pre-salt config and runs salt at the end.

In the above example both facebook and google can have machines in the 'eu-west-1' region, both can have common regional settings specific to their project, but also inherit common regional settings that apply to both. I hope this makes sense. For example:

regions/eu-west-1.yaml

ntp_server: 1.1.1.1

projects/facebook/eu-west-1/common.yaml

domain: eu-west-1.facebook.com

projects/facebook/eu-west-1/appserver.yaml

ipaddr: 10.20.30.40

projects/facebook/eu-west-1/dbserver.yaml

ipaddr: 10.20.30.50

given the grains for both servers are set as follows:

grains: 
  aws_region: eu-west-1
  projectname: facebook

the final config for appserver would be:

ntp_server: 1.1.1.1
domain: eu-west-1.facebook.world.com
ipaddr: 10.20.30.40

the final config for dbserver would be:

ntp_server: 1.1.1.1
domain: eu-west-1.facebook.world.com
ipaddr: 10.20.30.50

It keeps complex config DRY. Another advantage of hierarchical layout is that it avoids namespace collisions.

This is basically what Hiera implements and it quickly became the killer app for puppet when it came out (it's part of puppet now and most folk use it). Working as a consultant I have modeled many different infrastructures this way and I'm yet to find one that doesn't fit.

There is a hiera ext_pillar for salt but apparently it doesn't quite work. Also, there's salthiera ext_pillar:

https://github.com/gtmtechltd/salthiera

However, it doesn't look like it's very actively used and it's written in ruby which doesn't sound like a great match for salt.

Pepa works pretty well and it's almost there functionally. If it allowed more hierarhical data layout I'm sure many people would adopt it in a heartbeat.

I'm not familiar enough with salt to suggest implementation details. If it can't do grain substitution in the salt config, perhaps it can use a separate config file? (like salthiera does?)

Cheers, Andrew.

word commented 8 years ago

In answer to your question, in my example, each template cannot re-define which successive template get's compiled because it's pre-determined by grains (region, project, roles etc). Therefore the file system layout should be representative of how the data gets assembled.

However, there are many ways to make an omelet :-) It's a slightly different model than what you've got at the moment. I guess the question is whether it makes sense (and if it's possible) supporting this in pepa or is it a job for a different tool?

ake-persson commented 8 years ago

I will go through your suggestions in more detail later this week, haven't had time yet.

word commented 7 years ago

This is no longer needed as the new stack ext pillar implements it

https://github.com/bbinet/pillarstack

ake-persson / pepa

Hierarchy on the file system. #8