CouncilDataProject / cdp-backend

Data storage utilities and processing pipelines used by CDP instances.
https://councildataproject.org/cdp-backend
Mozilla Public License 2.0
22 stars 26 forks source link

Dynamically Generate TypeScript DB Models, Transcript Model, and other Constants #167

Open evamaxfield opened 3 years ago

evamaxfield commented 3 years ago

Idea / Feature

Dynamically generate TypeScript object definitions from cdp-backend Python DB models.

Use Case / User Story

As a developer I want to be able to add, remove, or update a database model in a single location rather than multiple.

Solution

Add a script to cdp-backend that when ran will generate a TypeScript package of the database models + extras that we can push to npm on new version push.

Alternatives

Stakeholders

Backend maintainers Frontend maintainers

Major Components

Dependencies

Other Notes

evamaxfield commented 3 years ago

In addition to creating the models themselves that have some interaction with the database, it would be great to generate the constants.

class FileFields:
    name = "name"
    uri = "uri"

So that we can use these constants during filtering and querying

evamaxfield commented 2 years ago

This is where the Python database models live: https://github.com/CouncilDataProject/cdp-backend/blob/main/cdp_backend/database/models.py#L26

This is where the TypeScript database models (and their unpackers) live: https://github.com/CouncilDataProject/cdp-frontend/tree/main/src/models

I think the best way todo this would be to use a jinja template potentially. But really this is all about code generation so whatever method works best.

liamphmurphy commented 2 years ago

I've done a bit of initial research on parsing the Python models. It's going to take a bit of puzzle-solving / trial and error, but I'm inclined to go with Python's ast module for doing the heavy lifting of parsing the Python code, and then using the grammar there as a means of generating the TypeScript with Jinja. For example, if ast returns that a line is a ClassDef, then we can get the name of the class from that tree node and pass it into the Jinja template.

Another approach is creating a set of regex's or substring conditionals to replicate the grammar of what goes into a Python class and any expressions / statements within it, but that seems a bit more hacky and error prone.

Or perhaps another approach that doesn't involve such heavy parsing I'm not thinking of...? Not sure what that would be though.

evamaxfield commented 2 years ago

Will give a more thorough look at your comment tomorrow but you may be interested to see how we are parsing this info in the database diagram generator: https://github.com/CouncilDataProject/cdp-backend/blob/main/cdp_backend/bin/create_cdp_database_uml.py

evamaxfield commented 2 years ago

And here is how that DATABASE_MODELS list is generated: https://github.com/CouncilDataProject/cdp-backend/blob/main/cdp_backend/database/__init__.py

Using the inspect lib

liamphmurphy commented 2 years ago

Learning many things about Python's built in modules... ty for the link.

inspect seems to serve a somewhat similar solution as ast but in a more user friendly way, that seems like a good way to go.

evamaxfield commented 2 years ago

Hey @liamphmurphy wanted to bump and see how you were doing on this? I understand if you are busy and haven't worked on it haha just trying to track various projects states is all

liamphmurphy commented 2 years ago

Hey @liamphmurphy wanted to bump and see how you were doing on this? I understand if you are busy and haven't worked on it haha just trying to track various projects states is all

Hey @JacksonMaxfield, thanks for reaching out! My bad for not communicating. I've been moving the last few weeks and most of my communication went out the wayside.

As a general update, I explored the inspect lib and think that's a great way to go. I'm planning on using a Python class to represent each TS model, just to make things tidy / hopefully more readable even if it's a bit overkill.

Needless to say if it needs to be done quicker than I am able to commit feel free to let me know and someone else can jump in, but if not, I'm looking forward to continuing on it.

evamaxfield commented 2 years ago

No worries! Good luck with the move.

As a general update, I explored the inspect lib and think that's a great way to go. I'm planning on using a Python class to represent each TS model, just to make things tidy / hopefully more readable even if it's a bit overkill.

Hmmmm does that mean we will need to update both the Python model and the TS model when we make changes? I don't really care the structure of the class the goal is really just "get database changes and constant additions or changes down to a single point of truth"

Needless to say if it needs to be done quicker than I am able to commit feel free to let me know and someone else can jump in, but if not, I'm looking forward to continuing on it.

No timeline crunch! Please keep working on it whenever move is wrapped up :heart:

liamphmurphy commented 2 years ago

Hmmmm does that mean we will need to update both the Python model and the TS model when we make changes? I don't really care the structure of the class the goal is really just "get database changes and constant additions or changes down to a single point of truth"

Good clarifying question, I just misspoke; I'm making a generic "Generator" class that'll have separate attributes, methods etc. to hopefully make it a bit easier to read what this new script is parsing in the Python models, to be used in the TS models. It won't make any assumptions about the models that currently exist.

e.g.:

class Generator:
    """
       Generator will contain metadata and support utilities for capturing the data needed
       from a single Python model to generate an equivalent TypeScript model
    """
    name = ""
    docstring = ""
    attributes = []
    references = [] # list of other models that the TS model needs to import

    source_tree = {}

   def set_attributes():
       ....
   def generate():
       ....
   ....

The actual logic for this isn't super complex, just fiddlying with the inspect library (and maybe ast) to make some of the decisions needed, such as which attributes are required.

evamaxfield commented 2 years ago

Ahhhhh cool! Thanks! I am already learning :)

liamphmurphy commented 2 years ago

Just a quick update. still chipping away at this! I've been dealing with COVID over the last week.

I've gotten some basic TypeScript models generated, I'm now working on a lot of those edge cases that require extra logic in the TS constructors, such as ReferenceField's.

evamaxfield commented 2 years ago

Ahhhh sorry to hear COVID hit you!! Hopefully you are resting well!

Yea, ReferenceFields are always great hahaha. Please let me know if you need anything or any input or anything!

Thanks!!!

liamphmurphy commented 2 years ago

At long last (and after a lot of trial and error), I have MatterFile outputting:

import { ResponseData } from "../networking/NetworkResponse";
import { Model } from "./Model";
import { DocumentReference } from "firebase/firestore"
import Matter from "./Matter"
export default class MatterFile implements Model {
    id?: string;
    matter?: Matter;
    matter_ref: string;
    name: string;
    uri: string;
    external_source_id?: string;

    constructor(jsonData: ResponseData) {
        if (jsonData["id"]) {
                this.id = jsonData["id"];
        }

        if (
                typeof jsonData["matter_ref"] == "object" &&
                !jsonData["matter_ref"] instanceof DocumentReference)
        ) {
                this.matter = new Matter(jsonData["matter_ref"]);
        }

        this.matter_ref = jsonData["matter_ref"].id;
        this.name = jsonData["name"];
        this.uri = jsonData["uri"];
        if (jsonData["external_source_id"]) {
                this.external_source_id = jsonData["external_source_id"];
        }

    }
}

I'm going to prepare a PR soon, just want to check some of the other models and check for any touchups first.

The thing I know I'll change is that the id field doesn't show up as required here, as the Python models don't seem to set the required=True flag like on the other fields. I'm thinking of enforcing that any field named id is required no matter what.

evamaxfield commented 2 years ago

I think it's because IDField types automatically assume required=True? I can check on that, maybe I am wrong.

This looks great though!