Convert some association lists to homogeneous maps?

Gabriella439 commented 6 years ago

The context for this is: https://github.com/dhall-lang/dhall-lang/issues/136

Many JSON schemas expect homogeneous maps (i.e. the JSON equivalent of Haskell's Data.Map.Map), such as in the following contrived JSON configuration:

{
  "users": {
    "john": {
      "admin": false
    },
    "mary": {
      "admin": true
    },
    "alice": {
      "admin": false
    }
  }
}

... where you would not necessarily know the set of users statically in advance.

The idiomatic Dhall type corresponding to the above JSON configuration would be something like:

{ users : List { key : Text, value : { admin : Bool } } }

The question here is whether or not dhall-to-json and dhall-to-yaml should support automatically converting association lists with a reserved field name to homogeneous maps.

The easiest way to explain this is to show the following example encoding of the above JSON configuration:

{ users =
    [ { key = "john", value = { admin = False } }
    , { key = "mary", value = { admin = True } }
    , { key = "alice", value = { admin = False } }
    ]
}

Some variations on this proposal:

Make this behavior opt-in (i.e. off by default) and guard it behind a command line flag
Don't implement this in dhall-to-json/dhall-to-yaml and instead implement this in more specific integrations (like dhall-to-terraform)
Don't reserve the field named value; any field name is fine as long as the other field is named key
Reserve different field names
Allow the user to specify which field name(s) to reserve for this transformation on the command line
Don't do this at all :)

blast-hardcheese commented 6 years ago

The most competing for me is

Allow the user to specify which field name(s) to reserve for this transformation on the command line

I believe this is useful for dhall-to-json and dhall-to-yaml, not just for dhall-to-terraform, due to the expectation that these structures exist already and are expected in the json-consuming world.

I was evaluating extending the algebra to add new language extensions in use-case-specific compilers, but it seems as though the parser and algebra are not designed for this. I expect this was somewhat on purpose to prevent fragmentation; if this was the case, maybe it makes sense to add a function that performs this transformation, but is only enabled in certain contexts, similar to how only flat structures can be rendered to JSON.

Thoughts?

Gabriella439 commented 6 years ago

Yeah, I agree that this should be part of dhall-to-json (and dhall-to-yaml) since it's a common idiom in the JSON world.

My preference is to turn it on by default for reserved key/value field names.

The reason why I want to standardize on the field names is to ensure that people can reuse or share utilities for programmatically generating homogeneous maps (which requires consensus on what the key name is).

The reason why I think it should be on by default is to ensure that users don't need to transmit information out-of-band about what command line flags they used when sharing code with each other. I usually reserve command-line flags for things that do not affect semantics (i.e. error messages or formatting). Ideally a Dhall expression is self-contained and doesn't require additional information or context for a user to correctly consume.

acdimalev commented 6 years ago

Having a unique type to represent this data structure seems like the correct approach to the problem. I would personally hope for a data type that does not have any overlap with an otherwise valid type, even if coersion between types exists.

e.g. hmap [ { key = "john", value = { admin = False } } ]

Shy of that, I would have to recommend against choosing reserved key/value names that are "too" obvious. Dhall is far from the first application of JSON to encounter this specific sort of limitation.

http://opentsdb.net/docs/build/html/api_http/search/lookup.html#example-request

Perhaps dhallk and dhallv?

blast-hardcheese commented 6 years ago

Having a particular function that enables the conversion would be sufficient for my needs, though it really begs Record.type to even be able to express the type of something that's been dynamically generated in this way. Mainly to express statically deriving a type that's part of a larger statically typed structure:

let rec = hmap [ ... ] in
let T = Record.type rec in
let wrap = \(t: Type) -> {key: Text, value: t} in
\(x: wrap(T)) -> ...

just as a quick example. I needed this when representing some structures in terraform that were almost entirely will typed other than the user generated content.

Gabriella439 commented 6 years ago

Yeah, there isn't too much of an issue using a somewhat long or obscure key/value field name because if people have an issue with it they can always define/share a helper function to convert from convenient non-reserved names to less convenient reserved names, i.e.:

convert
  : ∀(a : Type) → List { key : Text, value : a } → List { mapKey : Text, mapValue : a }

Regarding hmap, the reason why Dhall does not provide built-in support for sets with unique elements or homogeneous maps with unique keys is that it would entail support for checking the equality of values (particularly Text), which is one thing I try to avoid in a configuration language. See: https://github.com/dhall-lang/dhall-lang/issues/88

I think it's appropriate for dhall-json to assert key uniqueness as part of the conversion to JSON, but not within the Dhall language itself.

My inclination is to go with something like mapKey/mapValueas the reserved names to decrease the likelihood of collision with existing JSON schemas and to increase the likelihood that people unfamiliar with this feature can guess that there is some magic going on for code using those field names. Also, my initial target audience is ops users who will recognize the term "map" from the Go programming language (and as a bonus it also matches the Haskell term for this data structure). The main downside is that "map" also tends to be a heavily overloaded term in mathematics.

Another close contender was dictKey/dictValue or dictionaryKey/dictionaryValue. Python programmers (also common in ops) will recognize this name and it's semantically clear (plus "dictionary" is a term that even a lay person will understand).

Some other names I considered:

The main reason I avoid "hash" or "hashmap" is that it implies a very specific implementation
- Also, the Haskell implementation does not currently use hashmaps under the hood; it uses a tree-based implementation
- However, it is a popular name in the Perl/Ruby communities (which overlap with the ops community)
"Associative array" is a close contender for being a pedantically accurate name
- I slightly prefer "dictionary" because even a lay person understands what a dictionary is
- Also, "associativeArray" is a very long prefix
The main reason I don't use "dhall" is because it's semantically meaningless and is only acting as a namespace

blast-hardcheese commented 6 years ago

I agree with your position on almost all points; What would you say to a structure that wraps a list, providing this functionality when serializing into any language, but that exposes a similar interface to a list? I believe this offers the best of both worlds:

Explicitness around conversion, instead of behavior controlled by naming convention
Still being able to use the structure, even after explicitly stating that it is an associative array.

I think key/value are fine terms to use, provided that the conversion itself is explicit. Possibly X.wrap and X.unwrap for whatever X is decided on. Not having used anything but the JSON compiler, does this concept map to any other language integrations or future language integrations?

Gabriella439 commented 6 years ago

@blast-hardcheese: If I understand correctly, I think that you are proposing that the user could write code that optionally assumes three inputs like this:

  λ(Record : Type → Type)
→ λ(wrap : ∀(a : Type) → List { key : Text, value : a } → Record a)
→ λ(unwrap : ∀(a : Type) → Record a → List { key : Text, value : a })
→ { users = wrap { admin : Bool }
      [ { key = "john", value = { admin = False } }
      , { key = "mary", value = { admin = True } }
      , { key = "alice", value = { admin = False } }
      ]
  }

... where the user can name the Record/wrap/unwrap arguments to their code whatever they want (they are ordinary bound variables that are not reserved names). Then, if the interpreter sees an association list wrapped in wrap (or whatever the user names that bound variable) then it performs the homogeneous map conversion during the translation to JSON (and vice versa for unwrap if we add support for importing JSON).

I like that idea because it doesn't require any magic at all. All conversions are explicit and it doesn't collide with any existing namespace. If the user doesn't declare those function inputs of those types then dhall-to-json behaves the same as before.

Other languages that this concept would map onto are Python/Ruby/Perl where this sort of idiom is also common (and technically a JavaScript integration, which would be a superset of the current JSON integration).

blast-hardcheese commented 6 years ago

That's exactly what I was thinking, save for having an explicit function provided by the environment that does the wrapping and unwrapping, though I guess this isn't strictly necessary.

How would you propose implementing this?

Gabriella439 commented 6 years ago

The main reason I propose the user's code accepts the "built-ins" as function arguments is so that the code is compatible with other interpreters (i.e. you could reuse the same code with the dhall or dhall-repl executables). If I were to add additional true built-ins to dhall-to-json then the code wouldn't be usable outside of dhall-to-json

I can take care of implementing this (I've done this sort of thing before), but the way this works is:

Parse the abstract syntax tree
Type-check it
If the type is a function type expecting two "built-in"s of the correct type, then:
- Remove the lambdas providing the "built-in"s from the syntax tree
- Replace all occurrences of the bound variable corresponding to wrap with the conversion to a homogeneous map
Otherwise convert to JSON as normal

blast-hardcheese commented 6 years ago

To me, this still seems somewhat magic, but at least it's more explicit magic; you definitely seem to be strongly considering some tradeoffs.

This implementation additionally opens the door for more domain-specific, type-driven functions without polluting the base language, which could also be good.

I guess using this feature when loading the script into Haskell could just be HashMap's toList and fromList?

blast-hardcheese commented 6 years ago

Some more thoughts:

First,
```
λ(Record : Type → Type)
```
is extremely generic, would the implementation accidentally remove some existing functionality?
Second, what happens if the parameters are out of order or disconnected from the previously mentioned function?

Gabriella439 commented 6 years ago

@blast-hardcheese: You can actually already load Dhall's association lists into Haskell Maps and HashMaps without this feature and without any changes to dhall or dhall-to-json. All you would have to do is add an Interpret instance for those Haskell types (or a newtype wrapper around them to avoid an orphan instance). So the Interpret instance for those types would use fromList like you mentioned

However, on more reflection I think we should go back to the original plan of using reserved key/value field names (i.e. mapKey/mapValue). Assuming two "built-in"s via function arguments leads to poor ergonomics when importing expressions from other files that may contain homogeneous maps

blast-hardcheese commented 6 years ago

I'd be fine going back to even key and value so long as the conversion is explicit. The control mechanism needs to be outside the data, either by function or by type ascription or something like that. I don't want to be six months down the line with someone saying they can't represent a file format where the input is a list of objects with the keys key and value because of our decision here.

f-f commented 6 years ago

Chiming in just to say that I'm hitting this same problem - porting Terraform config to Dhall and realizing just now that it's not possible to express this idiom in the language so far - and that I like where this thread is going; my 2c on some points:

good idea to have this idiom only in dhall-json, because integrating it nicely in dhall-lang might mean turning on dependent types (as described here) and that might take a while.
+1 on the "original plan"
I like the idea having this automatic conversion from homogeneous maps on by default, and having defaults on the labels to match, but:
- there should be an option in dhall-json to turn the magic off
- the key/value to match should have defaults (my favorite so far are mapKey/mapValue) but should be configurable as well, to avoid the problem that @blast-hardcheese describes in the previous comment (that is, cannot represent some things because of hardcoded defaults)

Gabriella439 commented 6 years ago

Alright, then I'll set the default behavior so that most code in the Dhall ecosystem is compatible with each other but then allow people to opt out or change the reserved key/value field names.

blast-hardcheese commented 6 years ago

I'm still expecting we'll need to revisit this later, but maybe having some code will help further the discussion. Thanks for your patience through the back and forth here.

f-f commented 6 years ago

Reflecting a bit more on this, I think I'll go with making a small wrapper (we can call it dhall-to-gcloud-terraform) that takes a dhall-json output and makes it nice for terraform. The reason is that terraform uses this idea of non-homogeneous lists and maps all over the place, so even if we fix this a small wrapper is necessary, as expressing the data as terraform wants it would not typecheck.

Examples:

the problem at hand, in which the keys of a map depend on some other values
the list of resources that terraform needs to apply is not homogeneous, as it needs to contain records of different types. I tried to go with unions, but I cannot make it work, so I feel like it's not possible to typecheck this (rightly, as it's not a homogeneous list). If you have any ideas, help would be welcome. I think I'll fix this by having a record with one entry for each resource type, and an associated list of resources of that type (e.g. { google_cloud_sql_instance : List GoogleCloudSqlInstance }), and then have the wrapper transform this datastructure into the thing that terraform wants (a list of all the resources, one after the other).

Gabriella439 commented 6 years ago

@f-f: I'll probably implement this anyway because I think it's generally useful regardless of whether it completely solves the terraform integration

You probably want to do the processing using the Haskell API and then emit JSON from that using Dhall.JSON.dhallToJSON so that you don't have to use two separate executables in your pipeline.

I don't think type-checking is an issue here. The post-processing that I'm proposing is after import-resolution/type-checking/normalization but before emitting JSON. I will have an implementation up soon so that you can see exactly what I have in mind.

Gabriella439 commented 6 years ago

Alright, I have an example implementation showing how this would work:

https://github.com/dhall-lang/dhall-json/tree/gabriel/homogeneous_maps

I still need to refactor to the command-line API first to support the requested ability to opt out or modify the behavior appropriately

Gabriella439 commented 6 years ago

Now there is a pull request with the fully-implemented feature:

https://github.com/dhall-lang/dhall-json/pull/29

@blast-hardcheese: Give it a try and let me know if this works for you

blast-hardcheese commented 6 years ago

Using the gabriel/homogeneous_maps branch I was able to complete the PoC we were trying to do during BayHac in next to no time. Trying again with #29 was also successful.

MVP:

Library: Providers.dhall
Configuration: test.tf.dhall

I'm still shaky on how projects should be organized for modularity and reuse, but this definitely unblocks me for now. Thanks for the quick turnaround!

Gabriella439 commented 6 years ago

@blast-hardcheese: Wow, the latter link actually looks a lot like an HCL file :)

I'll go ahead and merge #29 then

f-f commented 6 years ago

Another success story here, thanks @Gabriel439 :)

However, I won't share my snippet here as the one that @blast-hardcheese posted looks much better 😅 (I took a slightly different approach, mostly data instead of lambdas). I was looking at somehow automatically generating dhall types from terraform providers, but it looks like we would have to parse go to do that (atleast in the case of the Google provider, as the code is basically the only source of truth), so I'm not really sure if it's feasible.

blast-hardcheese commented 6 years ago

@f-f I'm thinking the best we can get would be as described in https://github.com/blast-hardcheese/dhall-terraform/blob/master/CONTRIBUTING.md, published as a small library set published to ipfs or tracked as a git submodule or something.

It'll always be a race between components supported in Terraform and whatever providers and features are tracked in https://github.com/blast-hardcheese/dhall-terraform/.

It would be ideal for each terraform module were to expose JSON-schema or similar when queried, though that would require buy-in from Hashicorp that seems unlikely.

Gabriella439 commented 6 years ago

@blast-hardcheese: The way I see it, if the Dhall to Terraform bindings are the only place to get a schema for Terraform features that will encourage more people to use Dhall 🙂

blast-hardcheese commented 6 years ago

@Gabriel439 Have you found any barrier to adoption by distributing via ipfs, or should I continue down that route?

Gabriella439 commented 6 years ago

@blast-hardcheese: I have run into issues using IPFS. The main problem is that the latest version seems to have a memory leak of some sort (possibly the same as https://github.com/ipfs/go-ipfs/issues/3532), meaning that I have to periodically restart the IPFS server every week or two

I still continue to host the IPFS mirror to avoid disruption to existing documentation, but I wouldn't recommend others use it yet until that issue disappears. I'd recommend a simple static file server for now

blast-hardcheese commented 6 years ago

Hum. Another concern is some enterprise environments aren't big on having critical infrastructure hosted externally.

A caching proxy would be fine, though clunky.

Offline development would also be tricky.

IPFS itself seems to lend itself to multiple resolution sources/caching layers, though; maybe a resolution hierarchy would help here. Is this a dhall-lang discussion or dhall-haskell?

(I should say, whatever solution is discovered here should also apply to all URLs and possibly files, if that's desired/determined to not drastically increase complexity from a usage standpoint) (Additionally, I got a lot of usage out of let foo = (env:libpath).foo, so this is technically already possible in some limited form)

Gabriella439 commented 6 years ago

Yeah, that was the reason I originally liked IPFS (and still buy into the vision despite the issues). It allows anybody to transparently increase the resiliency by just pinning the same expressions, builds integrity checks in the URI, and provides a way to mount any IPFS resource locally using ipfs mount that lazily materializes paths on the fly

If you're willing to deal with the maintenance costs then I would say go for it, but just be willing to over-provision or restart the server if you are affected by the memory leak

dhall-lang / dhall-json

Convert some association lists to homogeneous maps? #27