Create a better and more relevant fact gathering system.
Problems
fact gathering is slooooow, mostly sequential and returns huge data sets of mostly ignorable data.
The current 'fact gathering modules' (setup.py) is huge and hard to maintain, has a lot of code, most of it not used and even when used, the information retrieved might itself not be used.
The 'gather_facts' action allows for configurable and multiple 'fact gathering plugins', even running in parallel, but to be effective it would require multiple plugins instead of the existing monolithic setup.
The ansible engine slows down the more data per host we have, current fact gathering maximizes this data even when not needed.
Solution proposal
Leverage the new gather_facts action to use multiple smaller and more targeted modules.
Split setup.py into multple modules that each do more logical subsets of the whole (but keep setup.py for backwards compatibility)
Switch the default gathering to use the new modules with a smaller subset
Example new ansible_min_facts module
#!/usr/bin/python
# -*- coding: utf-8 -*-
# (c) Ansible Project
# GNU General Public License v3.0+ (see COPYING or https://www.gnu.org/licenses/gpl-3.0.txt)
from __future__ import absolute_import, division, print_function
DOCUMENTATION = '''
---
module: min_facts
version_added: histerical
description:
- Fact gathering module that allows for a the minimal facts you should know about a system
- Trying to define 'minimal' is fun! right now platform, distribution and lsb
short_description: Gathers minimal set of facts about remote hosts
options:
gather_subset:
description:
- "If supplied, only the facts for the specified subsets is gathered"
- Possible values are 'all', 'lsb', 'distribution' and 'platform'
type: list
elements: str
default: "all"
extends_documentation_fragment:
- action_common_attributes
- action_common_attributes.facts
attributes:
check_mode:
support: full
diff_mode:
support: none
facts:
support: full
platform:
platforms: posix
'''
EXAMPLES = """
# Display facts from all hosts and store them indexed by I(hostname) at C(/tmp/facts).
# ansible all -m ansible.builtin.min_facts --tree /tmp/facts
"""
RETURN = r'''
architecture:
description: System CPU architecture
returned: success (platform)
type: str
sample: "x86_64"
distribution:
description: Name of your OS distribution
returned: success (distribution)
type: str
sample: "Gentoo"
distribution_file_parsed:
description: Was the expected distribution information file was parsed?
returned: success (distribution)
type: bool
sample: true
distribution_file_path:
description: File path to distribution information file
returned: success (distribution)
type: str
sample: "/etc/os-release"
distribution_file_variety:
description: Type of distribution information file
returned: success (distribution)
type: str
sample: "NA"
distribution_major_version:
description: OS distribution version 'major' version number
returned: success (distribution)
type: str
sample: "2"
distribution_release:
description: Distribution OS release name
returned: success (distribution)
type: str
sample: "n/a"
distribution_version": "2.8",
description: Full OS distribution version
returned: success (distribution)
type: str
sample: "2.8"
domain:
description: domain part of the fully qualified hostname
returned: success (platform)
type: str
sample: "example.com"
fqdn:
description: Fully qualified domain name
returned: success (platform)
type: str
sample: "myhost.example.com"
hostname:
description: the left most part of the nodename ('.' as separator)
returned: success (platform)
type: str
sample: "myhost"
kernel:
description: kernel information
returned: success (platform)
type: str
sample: "5.18.5-gentoo"
kernel_version:
description: Full kernel compile tag
returned: success (platform)
type: str
sample: "#1 SMP PREEMPT_DYNAMIC Tue Jun 21 20:40:20 EDT 2022"
lsb:
description: set of Linux Standard Base OS distributino information
returned?: success(lsb)
type: complex
sample:
"lsb": {
"codename": "n/a",
"description": "Gentoo Base System release 2.8",
"id": "Gentoo",
"major_release": "2",
"release": "2.8"
}
machine:
description: type of machine (architecture?)
returned: success (platform)
type: str
sample: "x86_64"
machine_id:
description: unique? machine identifier
returned: success (platform)
type: str
sample: "474bdfd137caca678ff2ebdf00000ed9"
nodename:
description: the full hostname as configured
returned: success (platform)
type: str
sample: "myhost"
os_family:
description: The 'family' this OS distribution si part of
returned: success (platform)
type: str
sample: "Gentoo"
python_version:
description: The version of python used to execute the fact gathering plugin
returned: success (platform)
type: str
sample: "3.9.13"
system:
description: System OS
returned: success (platform)
type: str
sample: Linux
userspace_architecture:
description: User library architecture
returned: success (platform)
type: str
sample: "x86_64"
sample: "64"
'''
from ansible.module_utils.basic import AnsibleModule
from ansible.module_utils.facts.system.distribution import DistributionFactCollector
from ansible.module_utils.facts.system.platform import PlatformFactCollector
from ansible.module_utils.facts.system.lsb import LSBFactCollector
def main():
module = AnsibleModule(
argument_spec=dict(
gather_subset=dict(default=["all"], required=False, type='list', elements='str'),
),
supports_check_mode=True,
)
result = {}
collectors = [DistributionFactCollector, PlatformFactCollector, LSBFactCollector]
for c in collectors:
result.update(c().collect(module))
module.exit_json(ansible_facts=result)
if __name__ == '__main__':
main()
Documentation (optional)
This is also a good chance to ad RETURN documentation to each module, a much easier to accomplish task since we divide the currently 'huge' setup.py
Proposal: splting facts
Author: Brian Coca <@bcoca>
Date: 2018-01-01
Motivation
Create a better and more relevant fact gathering system.
Problems
fact gathering is slooooow, mostly sequential and returns huge data sets of mostly ignorable data.
The current 'fact gathering modules' (setup.py) is huge and hard to maintain, has a lot of code, most of it not used and even when used, the information retrieved might itself not be used.
The 'gather_facts' action allows for configurable and multiple 'fact gathering plugins', even running in parallel, but to be effective it would require multiple plugins instead of the existing monolithic
setup
.The ansible engine slows down the more data per host we have, current fact gathering maximizes this data even when not needed.
Solution proposal
gather_facts
action to use multiple smaller and more targeted modules.Example new
ansible_min_facts
moduleDocumentation (optional)
This is also a good chance to ad RETURN documentation to each module, a much easier to accomplish task since we divide the currently 'huge' setup.py