Galileo-Galilei / kedro-pandera

A kedro plugin to use pandera in your kedro projects
https://kedro-pandera.readthedocs.io/en/latest/
Apache License 2.0
33 stars 5 forks source link

Generate HTML documentation from schema #25

Open Galileo-Galilei opened 1 year ago

Galileo-Galilei commented 1 year ago

Description

yaml or python are very explicit, but hard to show to managers / stakeholders / business teams. Being able to convert schema to prettier and more organized HTML documents would definitely help documentation efforts and consistency. it would be great of kedro-pandera could generate these docs automatically.

quoting @datajoely

Again dbt has had this for years and it's just a no brainer, we could easily generate static docs describing what data is in the catalog, associated metadata and tests. There is also an obvious integration point with enterprise catalogs like Alation/Colibra/Amundsen

Context

Dataset documentation is a much required feature to interact with non technical teams.

Possible Implementation

Add a CLI kedro pandera doc which would perform the conversion of all datasets with schemas.

The real question lies in the responsibility of generating the HTML from schema. This likely belongs to pandera itself.

datajoely commented 1 year ago

Perhaps this could be kedro catalog docs and is built in to Kedro / Kedro-Viz itself

Galileo-Galilei commented 1 year ago

I think several documentation-related features might end up in kedro, but It would be nice to be able to iterate faster and not be tied too much by retrocompatibility and minor releases schedules, at least for the beginning of development.

datajoely commented 1 year ago

I think if we were to go down this route - I think the libraries which generate API docs for things like Click are a decent parallel as they also follow and introspection to HTML content pattern:

https://sphinx-click.readthedocs.io/en/latest/ https://github.com/DataDog/mkdocs-click