gleam-lang / stdlib

šŸŽ Gleam's standard library
https://hexdocs.pm/gleam_stdlib/
Apache License 2.0
463 stars 167 forks source link

orddict or proplists module in Gleam? #89

Closed devonestes closed 4 years ago

devonestes commented 4 years ago

Given that maps are a bit restricted, and there is no map literal syntax (or plans for it), it seems like the orddict or proplists module from Erlang would make sense as a really primary data structure for arbitrary key/value pairs of mixed types.

It seems like there's some of this stuff in lists now, but since this is a specialized (and would be a highly used) form of list, I would think that making it its own module with different behavior (similar to Keyword in Elixir) might be helpful.

I'd be happy to implement these if y'all think this is a good idea.

lpil commented 4 years ago

We have functions such as key_find in the gleam/list module for working with 2 element proplists, they may be useful here.

Given that maps are a bit restricted, and there is no map literal syntax (or plans for it), it seems like the orddict or proplists module from Erlang would make sense as a really primary data structure for arbitrary key/value pairs of mixed types.

The limitation of values needing to be the same type is not a result of it being a map, so adding an orddict or proplist type would have the same problem. Gleam is statically typed so we need to know the type of the members of the data structure.

For key-value data structures of mixed types we use records, as is popular in Erlang:

pub type MyRecord {
  MyRecord(
    name: String,
    age: Int,
  )
}

This is similar to a struct in Elixir, a value object class in an OO language, a struct in C/Go/Rust/etc or a record in Haskell/OCaml/etc.

devonestes commented 4 years ago

Yeah, I saw ā€škeyfindā€˜ and the like, but as there is often a need to deal with some arbitrary/unknown number of possible key/value relationships (think of OS or application environment variables, HTTP headers, HTTP query parameters, etc.), it just made me think that this concept of a dynamic key/value data structure which allows mixed types would be a really helpful thing to solidify as a concept for folks to use. Another example would be passing attributes to an Ecto Changeset function (not that one would actually do that, but itā€™s a good example of the kind of dynamic creation of key/value data structures Iā€˜m talking about).

lpil commented 4 years ago

Currently we have the Dynamic type for working with dynamically typed data. It provides a type safe way of decoding dynamically typed data, but the trade off it it is less egonomic than working with statically typed data as you need to manually perform runtime type checks. It is largely used when working with user input or when handling values from calls to Erlang function. Perhaps it could help here? https://hexdocs.pm/gleam_stdlib/gleam/dynamic/

I would be keen on learning on how you see this dynamic key/value structure working, for example I would be interested in learning how this code would work:

pub fn get_name(dict) {
  let value = dyn_dict.get(dict, "name")
  // What type is `value` here?
  value
}
devonestes commented 4 years ago

A great example of the kind of thing I'm talking about is something like Ecto changesets, which could be implemented in Gleam.

def update_user(user, attrs) do
  user
  |> Ecto.Changeset.cast(attrs, [:name, :age, :height_in_cm])
  |> validate_length(:name, 6)
end

This seems like a place where a custom type would be too much, as there's so much potential variation in the format of (for example) a list of errors that an exhaustive custom type would be nearly impossible to write. But it also seems like just giving up and using a dynamic map would lead users to lose out on some helpful benefits. So, having something like a keyword list type offers a helpful middle ground. Without some way to express attrs or errors or something like that, the user needs to go directly from the "unsafe" thing to a custom type (with the possibility for a lot of Option values that would maybe be better avoided?), but that puts their application code at a very low level of abstraction. To have something like a keyword list offers a flexible, but still somewhat typesafe, middle layer of abstraction between the unsafe outer world and the user's application. This then at least allows for setting possible keys in that keyword list without having to mandate that all values are present, or that they're all of the same type:

type KeywordListKeys {
  ConstraintError
  UniquenessError
  ValidationError
}

type ChangesetErrors {
  List(tuple(KeywordListKeys, Dynamic))
}

Like, being able to at least type the keys of the data structure is a big win versus having keys and values be dynamic, and so it seems like this might be a nice middle ground that should be encouraged vs. having everything be dynamic.

lpil commented 4 years ago

I agree that it would be extremely awkward to replicate Ecto's validation API in Gleam (declaring the valiations would be even worse!), but I don't see why we wouldn't instead look to a statically typed language for inspiration for validation library design. They will likely have systems that are a better fit.

Talking more generally, how would this dynamically typed container work? Currently I'm not sure exactly what behaviour is being suggested

Could you give some examples of how values would be added and removed from the container? Along with the types would be + how they are inferred. Thanks

devonestes commented 4 years ago

The behavior being proposed is to replicate either the orddict or proplists modules in Gleam, and to create a type that explicitly gives a name to a list which only contains 2-tuples where all keys are of the same type (probably atom by default) and all values are of differing types. I would say either Orddict(a) or Proplist(a) or KeywordList(a), where ā€šaā€™ is the type of the keys in the list.

An even better example would probably be JSON encoding/decoding. Yes, you could use a dynamic map to store parsed JSON objects (not great), or you could insist that all JSON be cast to a custom type (a ton of work for users), but you could get a lot of value by using an orddict instead since then you have a lot more type information for at least the keys without having to deal with the values. It seems like this kind of data structure would be a really sensible middle ground between those two other options of the dynamic map or the custom type. Thatā€˜s definitely the thing Iā€™ve heard from friends that have done Elm as the most annoying part of using that language.

And thatā€™s just one example. Thereā€˜s also CLI flag parsing, HTTP headers, HTTP query params, DB interactions, etc... Basically I worry that if the only two options are dynamic maps or custom types, that folks will default to dynamic maps.

lpil commented 4 years ago

So far I think you are describing a more restricted version of the Dynamic type. It is used today for the use cases you are describing.

Would you be able to share some examples of the basic usage of this data type, including what the types would be? Once we have that we can figure out how it would fit into Gleam.

devonestes commented 4 years ago

I'm no expert, but the type would be something like

type Orddict(a) { List(tuple(a, Dynamic)) }

And, yeah, it would be a more restricted version of Dynamic, but that's sort of the point. In most key/value structures, knowing the keys is hugely valuable (hence required keys in structs), and this would allow you to have that without having to have all of those keys present in every collection.

lpil commented 4 years ago

What would this offer over the existing Map(a, Dynamic) and List(tuple(a, Dynamic)) types?

devonestes commented 4 years ago

It is indeed just List(tuple(a, Dynamic)), but by giving a name to this data structure, which would probably be used quite often, it makes it easier to reason about and helps users fall into the "pit of success" of using this structure often instead of a dynamic map. This is the same reasoning for naming this data structure in Erlang and Elixir, and so it seems to fit in this case as well.

lpil commented 4 years ago

I don't think we yet have any code that uses that data structure so far, either in official libraries or the ones made by others that I'm aware of.

When working with user input this structure would be more commonly parsed as a map, and when pulling into erlang a custom type would be more precise, safer, and less verbose. In all other situations dynamic would be a poor or type to choose because it effectively erases all the benefits you have from using a language with a type system.

I agree we want to design a pit of success, and because of that I think it is important that we do not have this data structure in the standard library. Attempting to replicate dynamic API in a statically typed language will result in an API that is the worst of both worlds.

lpil commented 4 years ago

I don't think we yet have any code that uses that data structure so far, either in official libraries or the ones made by others that I'm aware of.

When working with user input this structure would be more commonly parsed as a map, and when calling into Erlang a custom type would be more precise, safer, and less verbose. In all other situations Dynamic would be a poor or type to choose because it effectively erases all the benefits of using a language with a type system, while also being more verbose than statically typed code.

I agree we want to design a pit of success, and because of that I think it is important that we do not have this data structure in the standard library. Attempting to replicate a dynamically typed API in a statically typed language will result in an API that is the worst of both worlds.