Defining models in one place

BeritJanssen commented 4 years ago

For khatt I'm running in an issue we also have in I-Analyzer: if we've defined models in Django, how do we prevent defining them once more in typescript for the client? Could we create a pre-run hook which compiles from the Django file to typescript?

JeltevanBoheemen commented 4 years ago

I've looked into this for SASTA but couldn't justify the time. OPTIONS for an API endpoint can be requested from DRF (https://medium.com/nepfin-engineering/use-typescript-to-synchronize-django-rest-framework-and-vue-js-d103cf416e23) There are libraries that translate these into TS declarations (several exist, just an example https://github.com/bcherny/json-schema-to-typescript). I'm unsure how good this would work out-of-the-box.

jgonggrijp commented 4 years ago

There are multiple possible approaches and I don't think there is a single one that fits all applications. Continuing from https://github.com/UUDigitalHumanitieslab/I-analyzer/issues/147, here's what I've learned since then.

First, it's important to distinguish three related but different concepts: (1) the (abstract) data model sensu stricto, (2) the (technical) representation and (3) the interface language.

A typical web application always has a single data model and at least three representations: the representation of the data in the backend database (most often a relational database), the representation in the transport layer between server and client (usually JSON) and the representation in the frontend (for example instances of a TypeScript class). There are at least two interface languages: one between the database and the transport layer at the backend (e.g. Django ORM instances) and one between the transport layer and the frontend "models" at the frontend (e.g. with ngx-resource). There can be more representations and interface languages at both sides: when working with Django REST framework (DRF), for example, there is the "serializer" as an intermediate step between database and transport layer.

In a "classical" (naieve) stack, the data model is ad hoc and completely implicit and all representations and interfaces are manually defined, although frameworks often let you define a combination of representation(s) and interface(s) at the same time (such as in Django "models", DRF "serializers" and Backbone "models").

A solution like Protocol Buffers lets you make this same ad hoc data model explicit, by manually defining a single abstract representation that can be automatically compiled to backend- and frontend-specific concrete representation and interface definitions. With some luck, there are enough tools and plugins available that only the interface between frontend model and view remains to be manually defined. As such, you get a single source of truth about the data model, at the expense of additional technical complexity, since the protocol buffers definition with associated tooling is a new dimension of the system and you still have to work with (shortcomings in) the autogenerated concrete representations and interfaces.

I think the solution proposed by @BeritJanssen and the one referenced by @JeltevanBoheemen are somewhat in between Protocol Buffers and the naieve stack, in terms of convenience and tradeoffs. However, I don't think building our own solution would be the way to go.

An alternative way to make the data model explicit in a single source of truth, is by working with RDF (not to be confused with DRF). In this case, the data model tends to be less ad hoc because it can be aligned with existing vocabularies. It also becomes in a sense external to the project and potentially reusable in other projects. The representations and interfaces are not autogenerated from the single source of truth, but manual definition is still mostly eliminated by the self-describing nature of RDF and the existence of ready-to-use RDF libraries. The drawback is that the data model itself tends to be more complicated and that RDF database solutions tend to be slow.

RDF is not an all or nothing affair. Naieve JSON can be made RDF-compatible simply by adding an @context key that maps the other key names to URIs from an explicit vocabulary. So you could, for example, manually define the representations and interfaces at the backend as in the naieve way, then insert an @context in the transport layer referencing a public explicit (pre-existing) vocabulary. This enables you to work with the data in the RDF way at the frontend, so you need to manually define a representation only in one place.

In all cases, you still have to make a manual ad hoc translation between the frontend representation and the view, since the view is always unique to the application.

Since there are so many variations and tradeoffs, I don't think we can meaningfully choose a single approach that fits all applications. In simple applications, it might often be most economical to just work in the naieve way. If you need to go complicated, I think RDF is better (i.e. more standardized and future-proof) than protocol buffers in principle, but protocol buffers has a potential performance advantage. There is also the difference that RDF is inherently dynamically typed while protocol buffers is inherently statically typed, which adds even more tradeoffs to the equation. For example, you can't do much type checking on RDF data, while protocol buffers data can't be made self-describing, which means that any representations that can't be auto-generated still have to be defined by hand.

As for how to apply this to the cookiecutter, I think the naieve way should remain the default for the time being. However, we could add an option to work the RDF way, because this can be standardized to a very high degree. It would be most time-efficient to do this after #2 and https://github.com/UUDigitalHumanitieslab/readit-interface/issues/197.

CentreForDigitalHumanities / cookiecutter-webapp-deluxe

Defining models in one place #14