dashbitco / nimble_options

A tiny library for validating and documenting high-level options. 💽
Apache License 2.0
507 stars 38 forks source link

Add support for maps #90

Closed camilleryr closed 2 years ago

camilleryr commented 2 years ago

Hello! This pr is not in any mergable state - but wanted to start a conversation about adding support for maps as both a schema and option definition. This "implementation" (if you can call it that) seems to add that support for the cases I have tried (needs to be extended to cover the docs functionality I believe, but we can get to that!).

Before moving forward with cleaning up / adding additional test I wanted to get a few set of eyes on it to see if this was something you were interested in / your general thoughts.

Thanks!

whatyouhide commented 2 years ago

@camilleryr thanks for the proposal! What's the use case or advantage for using maps as a schema definition?

camilleryr commented 2 years ago

@camilleryr thanks for the proposal! What's the use case or advantage for using maps as a schema definition?

At work we have adopted NimbleOptions as our specification language for for parts of our system.

We have a generic data structure that is key to our application, and lots of different things we can do to that data structure, and the contract for all of those functions ends up as @callback process(data_structure(), keyowrd()) :: {:ok, any()} | {:error, any()}

Using NimbleOptions for specifying these options is already great - we get documentation, validation, and with an implementation of the Phoenix.HTML.FormData we get an ability to auto generate forms based on our schemas

The reason we wish to extend NimbleOption is that other parts of our system are starting to have the same basic needs of data specification, validation and form generation. More often than not we would reach for a map as the shape of this data rather then a keyword list, but currently cannot use any of the functionalities we have built out using NimbleOptions without conversions back and forth between maps and keyword lists.

As fas as the advantages of using maps strictly as the schema definition (rather than just being able to use keyword lists to specify maps) - thats not something I have a particularly strong feeling about, but the choice made some sense based on the current implementation.

Currently NimbleOptions enforces that data to be validated against a shcema is a keyword list. Its easy enough to allow maps to be validated against a schema, and to specify :map as an additional built in type, but there is not currently a concept for is specifying the shape of the outer layer of the data which means that would end up being able to validate either a keyword list or a map that met the specification.

      schema = [required: [type: :boolean]]]

      list_opts = [required: true]
      assert NimbleOptions.validate(list_opts, schema) == {:ok, list_opts}

      map_opts = %{required: true}
      assert NimbleOptions.validate(map_opts, schema) == {:ok, map_opts}

My approach (which I didn't put that much thought into, and would be happy to change!) was that without changing the semantics of the specification, we could use maps to specify maps and lists to specify lists.

      schema = %{required: [type: :boolean]}

      map_opts = %{required: true}
      assert NimbleOptions.validate(map_opts, schema) == {:ok, map_opts}

      list_opts = [required: true]
      assert_raise(FunctionClauseError, fn -> NimbleOptions.validate(list_opts, schema) == {:ok, list_opts} end)

This extension came up as part of a review call with dashbit, so @josevalim and @wojtekmach might have additional thoughts on our use of NimbleOptions and the potential to extend it to specify maps

josevalim commented 2 years ago

I think there are two discussions here:

  1. Support maps at the root
  2. Support schemas as maps

I think 1 makes sense, perhaps via an option in the schema that specifies the root. But I don't see a reason for 2 and there are benefits for keyword lists in DSLs/specifications (preserving order, allowing duplicates, etc).

josevalim commented 2 years ago

And hi @camilleryr! 👋 Thanks for the PR!

whatyouhide commented 2 years ago

I see 1. as being okay too, we're just adding a supported data type. As for 2., I agree with José. We'll end up polluting nimble_options beyond recognition, I’m afraid 😄 As the name of the library says, it's really concerned only with options. What you are looking for is more complex and I think other tools would serve you better. One of them could be Ecto maybe.

josevalim commented 2 years ago

And 2 should be a matter of calling Map.to_list/1 if you indeed have a map, right?

camilleryr commented 2 years ago

I think using the keyword list to write the schemas makes sense, and adding support for maps as a type and some kind of option to specify the root type would cover all of the things we need!

I have updated my branch to reflect that - the tests are still in need of work - but wanted to get your thoughts on this approach! Right now I have added a __root_type__ (I wanted something that would have little risk of collision - but Im open to any name for this! I also though of using $ since that is often used as the root in json path languages) option that can only be applied at the outermost level of a schema to specify if a map or keyword list is expected. To keep from introducing any breaking changes, if no value is provided, we will infer that the __root_type__ is a keyword_list

josevalim commented 2 years ago

Oh, I forgot that we can't specify options at the root. I understand why we need __root_type__ but it may not be ideal.

Maybe we could just allow validate to receive a map too? Or maybe we introduce another function explicitly for maps?

camilleryr commented 2 years ago

Oh, I forgot that we can't specify options at the root. I understand why we need __root_type__ but it may not be ideal.

Maybe we could just allow validate to receive a map too? Or maybe we introduce another function explicitly for maps?

I agree that the __root_type__ option was a bit awkward for a number of reasons. I think the only real benefit is that allowing us to specify the outer structure allows us to be very explicit in the specification, so the call sites don't need to be aware.

Allowing validate to take a map as well as a list is a various is a fine approach, but without the ability to validate the outer shape we could introduce some breaking changes for existing uses.

Then we could have validate_list and validate_map functions, or a validate/3 function that allows for some kind of optional root_type configuration - which would allow us to do all the things we need, but removes that as part of the specification and puts the responsibility on the caller to know what shape its supposed to be.

josevalim commented 2 years ago

We have two options it seems:

  1. Make validate/2 work for both, add validate_kw and validate_map for strict

  2. Add validate(kw_or_map, schema, root_type: :map | :keyword), but I am not the biggest fan of options having an effect in the function signature

So my vote is for 1 and we do have guards around validate/2, so it should not be a breaking change per se.

Thoughts @whatyouhide?

whatyouhide commented 2 years ago

I was thinking of the schema being a map, not the data being validated, so I had got this all wrong. Sorry about that. I’m still leaning on the opinion that validating maps through NimbleOptions is not what NimbleOptions is for. For example, Ecto schemas provide generally more flexibility (and a much wider ecosystem of tools) for doing something like that. In a pinch, you can Map.to_list(map) and validate that I guess? 🤔

camilleryr commented 2 years ago

While Ecto does offer a lot of great things - we decided to pursue NimbleOptions as it more closely aligned with what our needs are, and with the addition of map as a built in type, we are much closer to having all of our needs met!

While we could very easily make sure that we are calling Map.to_list on a map we are trying to validate, and calling Map.new() on the validated result - to me it seems this seems like an extension to NimbleOptions that does not change its purpose, but allows it fulfill that purpose for more use cases (while the convention for options to be a keyword list exists, its not a requirement, and often maps are more ergonomic depending on the particular use case!)

If you are sure you don't think this functionality belongs in NimbleOptions I will happily revert all of those changes and just PR the inclusion of map as a type, but I really do think that being able to natively specify and validate maps is a valuable addition!

And most importantly - thanks for all of the work you all do!

josevalim commented 2 years ago

While we could very easily make sure that we are calling Map.to_list on a map we are trying to validate, and calling Map.new() on the validated result

It may be worth benchmarking because I am thinking converting the map to list and list to a map may be the most efficient implementation (and also the simplest) and I personally wouldn't mind supporting such in NimbleOptions.

camilleryr commented 2 years ago

While we could very easily make sure that we are calling Map.to_list on a map we are trying to validate, and calling Map.new() on the validated result

It may be worth benchmarking because I am thinking converting the map to list and list to a map may be the most efficient implementation (and also the simplest) and I personally wouldn't mind supporting such in NimbleOptions.

I have rolled back all of the root_type work and instead have added a new function clause to validate/2 that will convert a top level map into a list, validate that agains a schema and then then covert the returned keyword list back into a map.

camilleryr commented 2 years ago

@josevalim - at your suggestion I ran some benchmarks against the "delegating" code (runtime delegation of functions that work on keyword_lists and maps) vs the "conversion" code (explicitly turning all maps into lists, validating them, and then converting them back) and you were correct! Conversion is faster for the scenarios I checked! I have updated this PR again to stick with this approach

Name                 ips        average  deviation         median         99th %
conversion       48.61 K       20.57 μs    ±39.22%          18 μs          55 μs
delegated        42.33 K       23.62 μs    ±39.15%          21 μs          61 μs

Comparison:
conversion       48.61 K
delegated        42.33 K - 1.15x slower +3.05 μs
josevalim commented 2 years ago

If conversion is faster, then perhaps there is nothing to change here? You could wrap nimble options and do the conversion within your application and additionally ensure anything else you may need to? :)

camilleryr commented 2 years ago

If conversion is faster, then perhaps there is nothing to change here? You could wrap nimble options and do the conversion within your application and additionally ensure anything else you may need to? :)

We can and do have an internal wrapper that we can use to enforce the "outer layer" conversion of map to lists and back, but it does require changes inside NimbleOptions to support map as a new data type. I have removed the additional guard around validate/2 so now everything should work the same way but we should now be able to support map as internal data structures.

camilleryr commented 2 years ago

I have extended map support into :or and :list types and added tests to cover all the existing nested schema tests for the map type as well. I think this covers everything - let me know if you know of anywhere else we would need to exted to cover map and if you have any more thoughts!

Thank you @josevalim and @whatyouhide for your time helping me get this sorted out!

josevalim commented 2 years ago

:green_heart: :blue_heart: :purple_heart: :yellow_heart: :heart: