Open ghjklw opened 1 week ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
The (small) problem is that airflow.cfg file is not json. It's 'ini" format. I am not sure if you can validate such format easily. Do you know any tools that can do it and tested it with Airlfow .cfg file @ghjklw ?
Also be aware that we are planning (as part of https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components to migrate the format from ".ini" format to ".toml" format which is de-facto standard for configuration for many python projects now. Will that work with it? Any tools that can do it?
Maybe it should be made as part of that move and maybe you would like to contribute to that effort and actually take part in the .toml conversion and adding validation for the toml file @ghjklw ?
BTW. I know you mentioned "even better toml", but I am asking about CLI tools - somethign that can be used in our pre-commits ad validate the schema in CI. The big problem with such tooling that is IDE-only - is that we are not able to verify if such schema is actually "correct" and validating config files generated automatically during testing would be a good test.
Hi @potiuk
My mistake for assuming airflow.cfg
was toml and not ini 🙈
Regarding the tooling for JSON schema with TOML, a fairly easy alternative relying only on largely used robust projects/stdlib would be to read the toml file as a dict
using tomllib.load
and then validating the dict
using jsonschema.validate
which actually validates a mapping/dictionary/object and not a string.
An even more powerful solution, but which might require more work depending on how the configuration is implemented today would be to leverage pydantic-settings. We would define the configuration as Pydantic models, creating the JSON schema would be straightforward. Pydantic could handle itself the parsing of the TOML file through the TomlConfigSettingsSource
. An added benefit of that approach is that it would create an abstraction layer between the definition of the settings structure and the format they're stored in/how they're parsed. It would then be quite easy to use YAML/JSON... pydantic-settings
can also take care of variables defined through environment variables.
Last but not least, check-jsonschema
has support for TOML. It can be used both as a CLI tool and as a pre-commit hook.
Unfortunately, I really do not have the bandwidth nor the experience with Airflow's development to offer my help with the implementation, but if anyone wants to work on it, I'd be happy to be a sparring partner/help with testing.
Marked it as "good first issue" - hopefully will pick it up
Description
There is already a good structured YAML file providing metadata about all valid configuration options in
airflow.cfg
: airflow/config_templates/config.yml.I think publishing the same data as a JSON schema and eventually to https://www.schemastore.org/json/ could be very useful.
Use case/motivation
airflow.cfg
Airflow won't complain if the configuration file contains a typo or a non-existent configuration key making it easy to make mistakes. It could also make it easier to catch invalid values earlier.
Related issues
No response
Are you willing to submit a PR?
Code of Conduct