droher / boxball

Prebuilt Docker images with Retrosheet's complete baseball history data for many analytical frameworks. Includes Postgres, cstore_fdw, MySQL, SQLite, Clickhouse, Drill, Parquet, and CSV.
Apache License 2.0
120 stars 16 forks source link

Add markdown documentation for database schemas #75

Open jts599 opened 10 months ago

jts599 commented 10 months ago

74 Adds a script that generates a markdown file for the database schema

jts599 commented 10 months ago

I only kinda know what I am doing so if you want me to do something differently, feel free to let me know. Also not sure how to add the issue that I opened up to this.

droher commented 10 months ago

This is awesome, thanks so much! With respect to style, it would probably easier to format using a package like https://github.com/Python-Markdown/markdown instead of raw string formatting. If you wanted to give that a shot, feel free - but this is already a huge help and more than enough effort on your part, so only do it if you'd like to as a learning exercise.

jts599 commented 10 months ago

I'll give it a go when I have some time. What I have is not super maintainable so it would be good to do something a little more robust.

jts599 commented 10 months ago

Alright, I have updated to generate the HTML in python:

You can see doc page as a preview here

droher commented 9 months ago

Amazing, thanks! I will review this within the next couple days. It looks like there are a bunch of new Python files - are those all necessary or are there some dupes?

jts599 commented 9 months ago

Strictly speaking, there are some that are unnecessary, but there are no duplicates. Multiple files exist mostly for modularization purposes. The code that I wrote will generate representations of the schema in a few different formats, and Each python file handles a different format. I included these since I think each format has a purpose, but I could totally understand not wanting all the different types in the official repo, so let me know if you want me to remove any.

Python Files:

  1. GitHubMDGenerator.py - Generates Github flavored markdown. I kept this around since it renders natively in github
  2. PyMDGenerator.py - Generates my Python Markdown that gets turned into HTML - Strictly necessary
  3. generate_doc.py - Main script - Strictly necessary
  4. generate_html.py - converts the Python Markdown generated into HTML. Basically a wrapper for python markdown - Strictly necessary
  5. generate_json.py - Generates a JSON representation of the database schema. This is what gets used by all other generators. - Strictly Necessary (The JSON file is also really nice to have open for github co-pilot)
  6. generate_markdown.py is a base class for the Markdown Generators to inherit from. - Strictly Necessary

I could certainly refactor some of this into fewer files, I just had them split for cleanliness purposes. Let me know if you would like me to do some cleanup.