dagger / dagger

An engine to run your pipelines in containers
https://dagger.io
Apache License 2.0
10.86k stars 581 forks source link

Improve codegen reusability #5226

Open helderco opened 1 year ago

helderco commented 1 year ago

Summary

Refactor the Go based codegen tooling to make it easier to maintain the codegen feature in a growing list of SDKs.

Background

The Go SDK launched with a custom implementation of GraphQL introspection to generate the Dagger client based on the API schema. This is similar to the GraphQL Code Generator project, but uses Go code rather than Javascript.

Python needed to be released shortly after but the Go introspection wasn’t ready yet to be shared with other SDKs. As Python’s GraphQL client already provides the introspection, it was faster to do the code generation in Python which was done in a day.

With Node.js right after, we were still trying to decide if it was easier to do it with Javascript, but sharing the workload allowed another Go developer to help out by modularizing the Go introspection to serve both Go and Node.js.

Motivation

While adding support for Container.sync in the API I noticed that It makes more sense now for every SDK to use the same Go based codegen:

Converge to using the Go based codegen

At the time the SDKs were launched, codegen was simpler. However, features kept being added over time, requiring more maintenance. Now, expanding to more languages like Rust and Elixir, there’s even more different implementations of codegen, making it harder to keep feature parity in sync.

While initially doing the Python codegen in Python allowed for a faster iteration, and allowed access to some Python APIs like detecting if a name is a reserved keyword, it’s now a bigger problem to not share common Dagger specific functionality requiring unnecessary duplication.

While adding enum support requires knowledge in each language to extend the SDK’s client, there’s also decisions like knowing when to convert an ID into an object or vice versa, knowing when a field should make a request vs be lazy, and even possibly implementing custom error types. Putting these in a common abstraction layer reduces the burden on each SDK.

Improve the common layer

Even between Go and Node.js, since both use Dagger’s Go codegen, there’s still unnecessary duplication. That’s because Node.js SDK’s functions and templates were added without much change to Go SDK’s.

I saw several places where this can be improved, by moving everything that has to be in all SDKs to the common layer, as much as possible and in an agnostic and granular way. It’s the same idea as our own SDK clients, that are dagger specific abstractions on top of more general query builders. In the same way, this common layer should be a dagger specific abstraction on top of the general GraphQL introspection layer.

There’s already a few common functions but I suggest taking it to another level.

Simplify templates

I think working with the templates is a bit difficult at the moment. There’s too much logic in some, or two spread out in different files.

There’s too much noise with indentation and general whitespace, with a lot of ifs within a for, within a if... Some templating languages terminate with {% endfor %} and {% endif %}, but Go uses the same {{ end }}. At least there could be a comment on every {{ end }} that references what’s the corresponding opening statement when it’s not immediately obvious due to the distance between opening and closing.

Templates should be as dumb as possible and easy to read and maintain. Sometimes even not worth the trouble.

High-level design

There’s no formal specification for this yet, but at a high-level:

[^1]: This is not directly related to the Go codegen tooling but it helps with the codegen maintainability in all SDKs individually.

I want to migrate Python’s codegen to Go either as a part of this process or as a consequence of it, bringing all supported SDKs under the same tooling. Other SDKs to hopefully follow, especially when considering maturing out of experimental, or as a new addition to the repo.

Common abstraction layer

The Go codegen implements the visitor pattern as used in the GraphQL Code Generator project (but using Go instead of Javascript, and not caring about operations). Between the GraphQL introspection and the templates however, there’s a lot of Dagger specific decisions and data manipulation that can be abstracted away in a Dagger abstraction layer.

So rather than using the visitor pattern at the GraphQL introspection layer, do as much data processing as possible for anything that’s shared between all SDKs, and visit the processed data instead.

Templates

Some top-of-mind improvements:

Boilerplate

Add helpers for common patterns, like delegating to templates, hooking in the needed naming conventions or anything else that helps reduce the boilerplate for an SDK.

Rationale

An advantage for using Go here is that any change that requires re-generating clients comes more often from changes to the API which is implemented in Go. There’s been a lot of cases where a new feature is added to the API which requires a change in all codegen solutions. If the same developer adapts the common abstraction layer accordingly, it should be much simpler to add support in each language.

The “entrypoint” for an SDK should be defining the functions that are to be visited. Sometimes it’s much simpler to just produce the needed output in the function rather than delegating to a template. In other words, using templates is not a requirement, only use them if they add clarity and are easier to maintain. Similarly, if something else instead of Go templates or functions is easier to maintain, we could also have a helper to hook that into the visitor “life-cycle”.

The most important idea here is to centralize as much as possible the common parts between every SDK, and help make it easier to maintain in a growing list of SDKs.

matiasinsaurralde commented 1 year ago

@helderco I've been playing with the following idea:

helderco commented 1 year ago

@matiasinsaurralde, thanks for doing this! Looks good as a starting point to improve the common layer. 🙂

Good to move CommonFunctions over to BaseGenerator. 👍

matiasinsaurralde commented 1 year ago

@helderco Thank you! Will spend some time this week around it.

wingyplus commented 1 year ago

hi, may I knwo the progress of the issue? I'm interested in integrating Elixir SDK to use Go codegen and I would like to help after I make Elixir get released, which I believe is soon. ^^

matiasinsaurralde commented 1 year ago

@wingyplus I'm working on a branch with additional changes and I could wrap it up in the next 1-2 weeks. If you do any progress earlier we can figure out how to integate changes later on. Happy to discuss!

TomChv commented 7 months ago

I'm 100% aligned with the idea of sharing common dependencies between module and I think abstracting the generation logic into a common implementation and just let SDK implement the syntax make sense.

I think we should work on that once our module are stable. So it will make it easier to supports other languages modules! As Typescript is already part of the Go codegen, it might be simpler to first start by refactoring both Go and Typescript and then add other languages, so we are first sure that it's possible to do that. Python and Typescript are very similar so it shouldn't be to hard to port it, I don't know about Elixir or Rust though.

helderco commented 5 months ago

Missed this update, but I've started experimenting with implementing this using a Dagger Module: