Type Providers exploration

chouzar commented 4 years ago

Type Providers are an interesting approach to consuming resources in a type-safe way; it is a feature of the F# programming language and can be used to consume:

Documents like XML, JSON, HTML.
Mapping SQL relations into types or check SQL at compile type.
Communicate with APIs, Protocols, Datasets or complex domain idiomatically.

Lectures:

Why?

This issue was spawned from a previous discussion for an SQL query builder library. This is just meant as an exploration of what Type Providers could bring to the table as an useful or feasible feature for the language.

CrowdHailer commented 3 years ago

I'm very keen to reduce the amount of code I write encoding and decoding JSON and am willing to put some time into trying to work this out.

@chouzar have you thought about this any more? What would the API look like? Would ingesting something like a JSON schema file work as a type provider?

lpil commented 3 years ago

To start a tool that takes an example piece of JSON and outputs a file of Gleam types + decoders could be useful without requiring anything from the language specifically.

Any addition to the language itself around this is likely going to take a lot of time in design and implementation.

chouzar commented 3 years ago

@chouzar have you thought about this any more? What would the API look like? Would ingesting something like a JSON schema file work as a type provider?

I haven't really gone through more research on this 😞 so I'm unsure how the API could work out, but ingesting a "resource" should be enough for generating its schema.

To start a tool that takes an example piece of JSON and outputs a file of Gleam types + decoders could be useful without requiring anything from the language specifically.

It seems that type providers are a compile/build time construct so an external tool would be on-point 🎯 .

Take this with a grain of salt, a lot of this are assumptions on my end ⚠️ As far as I understand there are 2 main mechanism for generating types:

Types generated and known at build time
Types generated and known on request (runtime?)

The second case is there for resources that might have an indeterminate (possibly very big) amount of types and requests are done while editing code ⌨️, for which a program that makes the request like intellisense or a language server are of great help.

The first paper is a great "light" read which introduces a lot of the concepts surrounding type providers but the second one might actually contain more insights since it is written in Rust and uses JSON as case study.

CrowdHailer commented 3 years ago

Types generated and known on request (runtime?)

I think even in this case they are generated at compiletime, but lazily so only the types used in the program are generated. This in contrast to the tools that take a spec and generate every type that spec specifies which I will call the simple approach.

The simple approach breaks down if you want a type provider for schema.org or linked data in websites which are highly connected and pretty much unbounded. However if deriving types from you api specification, then simple is fine because you would assume that everything in your api spec should be used by your implementation of the api.

As I understand type providers consist of the following.

code gen that turns a specification (e.g. json schema) into a representation that the type checker can digest.
encode or decode or both functions
A hook linking the compilation of a program; the code gen code and the specification together.
(optionally) a way to do 3 in a lazy fashion.

Notes

External tool

I think @lpil is right that making a tool that generates gleam code is the right way to start this as it should answer most of the work involved in steps 1. and 2. Making an external tool means that the communication API between the tool and the gleam compiler is Gleam source code files. if there where integrated closer I guess you might just stick with parsing an AST representation around?

Internal specifications

The specification file can be anything that has a codegen tool for it. So instead of using JSON schema a tool could instead understand Gleam types and generate encode decode functions for them. This would more or less remove the need for step one.

In a heterogeneous system, i.e. Gleam Backend Typescript frontend, it might make more sense to use json schema over making a tool that generates typescript stubs based on a Gleam type. However it would (if someone wanted to develop the tool) be possible to use either the Gleam or typescript program as the data source for generating types in the other program. If you use the same language on two sides of a connection relying on simply deriving encode/decode functions for a type is very interesting.

Encode Decode

These I think are just commonly useful functionality for a type provider, any utility functions could be derived depending on usecase. For example String interpolation or SQL queryies could be generated.

e.g.

import gleam/interpolate<"Hello $1">.{serialize}

fn run() {
  let x = serialize("World!")
}

import gleam/sql<"SELECT id, name FROM users WHERE active = $1">.{query}

fn run() {
  let tuple(id, name) = sq.run(query, True)
}

Existing tools

There are some completely standalone tools for codegen, I'm not sure if contributing to these is a worthwhile shortcut, or distraction in throwing together a prototype.

Example API

Here's a pretty large example

import gleam/openapi<"myapp/openapi.json">.{ListUsers, CreateAUser, GetAUser}

fn route(request) {
  case try openapi.decode(request) {
    ListUsers(response) -> {
      let users = get_users_somehow()
      response(users)
    }
    GetAUser(user_id, response) -> {
      let user = get_user_by_id(user_id)
      response(user_id)
    }
  }
}

pub fn handle(request) {
  case route(request) {
    Ok(response) -> response
    Error(error) -> error_to_response(error)
  }
}

paths:
  /users:
    get:
      summary: List users
      responses: 
        '200':
          content: 
            application/json:
              schema:
                $ref: '#/components/schemas/ArrayOfUsers'
    post: 
      summary: Create a user
  /users/{id}:
    get:
      summary: Get a user
      parameters:
       - name: id
         in: path
         required: true
         schema:
           type: integer
           format: int64
components:
  schemas:
    User:
      type: object
      properties:
        id:
          type: integer
          format: int64
        name:
          type: string
      required:
        - id
        - name

At this point I must say I already find openapi pretty verbose. For that reason I think this project is interesting https://github.com/rawrmaan/restyped, however it is MUCH less mainstream.

lpil commented 3 years ago

OpenAPI has been pretty much the standard for the last half decade so I think it would be the way to go for HTTP JSON APIs. I like that it can be used to generate servers and clients, and there is a lot of existing tooling for many languages. We could make a Gleam addition to one of these tools (or make our own one) and get clients for our Gleam servers and Gleam clients for servers in many languages more or less for free.

That other library seems to be exclusively for TypeScript and uses the TypeScript type checker extensively, so I think we'd struggle to use it. Having to run npm install for a Gleam API project is something I'd like to avoid too.

RE laziness I think that would not be needed. The Gleam compiler is very fast and the time taken to process a module of type definitions is linear with the size of the module. If anything the laziness might make it slower, and it would make the compiler much more complex.

I really like the SQL type checking, this is something I intend to build an external tool for in future.

CrowdHailer commented 3 years ago

That other library seems to be exclusively for TypeScript

Which one, restyped or quicktype? quicktype works perfectly happily with json schema starting point.

lpil commented 3 years ago

Oh I missed quicktype! Yes this is exactly the sort of thing I was thinking would be good for Gleam decoders

CrowdHailer commented 3 years ago

Note to self this probably also works as a way to get any compiletime values in to the program. Such as when it was compiled, or what the git hash is. This might be like using a hammer to crack a nut. but there is precedence in this Fsharp literal provider https://github.com/Tarmil/FSharp.Data.LiteralProviders#builddate

This issue is related to https://github.com/gleam-lang/gleam/issues/703 Essentially reading a file at compile time is a no op type provider.

CrowdHailer commented 3 years ago

Using a module inspired syntax allows two nice things

import gleam/env<"spec.file">.{Config as MyConfig, read as do_the_lookup}

I think if you restrained the item inside the angles to always be a filename you could do 2 nice things.

inside a gleam/raw.gleam file have a function that took the file contents and returned Gleam code.

pub fn provide(contents: Sting): String {
  string.concat("pub const value = " contents
}

This could return an AST but I think we've mentioned that's not part of the public API

The compile could use the filename as the key in a cache so it's not compiled twice.

Limiting the type providers to the file contents, rather than having each provider has a few implications.

The spec for the type provider would need to be in the project. i.e. it could not be fetched from an API endpoint. I think this is a good thing so the compiler will not fail due to network errors etc. particularly an issue if running on CI boxes etc etc. Opinions may differ.
The compiler can have shared logic for reading the file, nice error messages etc

CrowdHailer commented 3 years ago

Further reasons for having a type provider be called only once: Importing the same "provided" module in two places in the code should use the end up with the same types.

e.g.

// A module
import gleam/csv<"spec">.{Row}
import my_app/submodule

pub fn main() {
  let rows = csv.read(filename)
  list.map(rows, submodule.manipulate_row)
}

// A sub module
import gleam/csv<"spec">.{Row}

pub fn manipulate_row(row: Row) {
  // the Row type here needs to be the same as the one that is read in the main module before being passed down here
}

lpil commented 3 years ago

This is great, thank you Peter. I like the way this is going.

Evaluation

One immediate problem is that we don't yet have a good way to run Gleam code during compilation. We may be able to boot an Erlang VM to do this, though there's a question as to whether the target used for the type provider needs to match that of the program- I could imagine Erlang based type providers being useful for JavaScript rather than having to rewrite parts of them.

Using returned values at compile time

I found this example given in that F# project interesting because it shows use of values returned from a type provider used by other type providers. Building usable constants is powerful and useful.

type Sql = SqlProvider<Common.DatabaseProviderTypes.MSSQLSERVER,
                       const(Env<"CONNECTION_STRING",
                                 "Server=localhost;Integrated Security=true">.Value)>

Passing non-string/number values to providers

Anything other than the simplest values mean that we need to perform type checking on the arguments, especially if the arguments are references to functions or constants in the module. Is this desirable? Does F# permit this?

Syntaxes

More syntax ideas:

import gleam/env.provide("spec.file").{Config as MyConfig, read as do_the_lookup}

import my_config_module.{Config as MyConfig, read as do_the_lookup}
  via gleam/env.provide("spec.file")

These syntaxes are closer to what we have already, but they may imply that values from the modules can be used as arguments.

Using information about types

It would be useful to make functions that do things based on types defined in the caller module. i.e.

A function that returns a list of all variants in a custom type like [North, South, East, West]
A function that pretty prints a record
A function that parses a type from strings

For these the type provider needs to know the definition of the type. Maybe something like this?

import gleam/variants.provide(type Directions).{list as all_directions}

Again, that means we would need to have this code run during or after type checking. Chicken and egg.

Side effects and causes

The spec for the type provider would need to be in the project. i.e. it could not be fetched from an API endpoint. I think this is a good thing so the compiler will not fail due to network errors etc. particularly an issue if running on CI boxes etc etc. Opinions may differ.

It would be nice to be able to generate Gleam from database queries and a database connection.

lpil commented 3 years ago

I think I need to go write a bunch of F#.

CrowdHailer commented 3 years ago

Naming.

Elixir and Lisps have codegen macros Rust and Go have build hooks F# has type providers OCaml has PPX

I wonder if they could be called generated/derived modules

Performance costs

Concern raised by Greg "you either have to generate another language, or fire up the beam. the first means maintaining another target, the second means significant compile time slowdown"

https://discord.com/channels/768594524158427167/768594524158427170/799660151710810172

Is that true that we have to make the descision? If there was a way to tell the compiler to make an OS call the provider could be an escript and slow or it could be a rust binary and faster. Not something that has to be decided for all providers but is a descision of each library provider.

CrowdHailer commented 3 years ago

Further work on Type providers as an external step https://github.com/midas-framework/gleam_providers

CrowdHailer commented 3 years ago

Further comments

"you either have to generate another language, or fire up the beam. the first means maintaining another target, the second means significant compile time slowdown"

We now have a second target. :-)

Also using the <compile-time-info> syntax is working well in my experiments. My approach is to consider a provider a function with the following signature fn(string, type_) -> ast The string is the value inside the <>. The type of the provider is unbound and the checker runs over the entire program tree. After the first pass the provider functions are invoked with their config string and the type signature that they need to fill. The type of the returned AST is infered and then checked against all the existing type constraints the checker has. The returned AST is substituted for the provider in the AST and the whole thing generated.

The nice thing is it is not limited to imports.

i.e.

import csv_provider<file.csv>.{foo, bar}

pub fn main(a, b) {
   let message = format<Hello $1 $2>(a, b)
   io.print(message)
}

csv_provider knows it needs to return a module with fields foo and bar. format knows it needs to produce a function that takes two args and returns a string

Note the described process I have prototyped here. https://github.com/midas-framework/project_wisdom/pull/5/files#diff-7f373c09bfaffb2bca7457bcba1fbadc4b4e6902067aced4688909e04d0decda So it is not in Gleam, I needed a smaller AST and better understanding of the compiler so I built a language in Gleam. If the approach is sensible I'm sure it could be translated to Gleam

gleam-lang / suggestions