erlang / eep

Erlang Enhancement Proposals
http://www.erlang.org/erlang-enhancement-proposals/
264 stars 67 forks source link

EEP XXX: Module for lists of maps #29

Closed zuiderkwast closed 2 years ago

zuiderkwast commented 3 years ago

An EEP as requested in erlang/otp#4831.

Please EEP admins, allocate EEP numbers for the EEPs in this and the other open PRs, so they can be finished.

Maria-12648430 commented 3 years ago

I think this EEP is fine and uncontroversial. Nothing I would change.


An observation that is outside of the EEPs' scope but related, concerning keytake.

lists:keytake is an outlier in the lists:key* family of functions in that it is the only one that returns {value, Tuple, TupleList2}, all others omit the value part.

Yes, there is keysearch doing it, but it has been superseded by keyfind and is kept only for backwards compatibility, for the reason that tagging the result is not necessary, as outlined here. The EEP proposes to leave keysearch out of the maplists module, and I wholeheartedly support that.

Why don't we have a function likewise superseding lists:keytake, as the same reasons apply there? At a blind guess, it is simply because nobody ever found a nice synonym for "take" 😅

If maplists:keytake is to be fully analogous to lists:keytake, it will have to retain the same peculiarity of returning {value, Map, MapList2}. I guess there is no way around it. However, if a replacement for lists:keytake ever does appear (keygrab? 😁), keytake will have to be kept for backwards compatibility not only in the lists module, but also in the new maplists module.

I guess what I'm trying to say is, if you have an idea for a function to supersede lists:keytake, it would be good to have it before this EEP comes to fruitition, and then keep keytake out of the maplists module.

lhoguin commented 3 years ago

I think maplists:keytake should return a 2-tuple, not a value tuple. There's no point in carrying around old mistakes in new modules. See also maps:take returning a 2-tuple.

I don't know how lists:keytake can be made to return a 2-tuple eventually. It would be good if it could. Perhaps the addition of maplists can lead to the addition of tuplelists with a better interface.

Maria-12648430 commented 3 years ago

Perhaps the addition of maplists can lead to the addition of tuplelists with a better interface.

You know, that's a pretty good idea ^___^ The key* functions always felt a bit out of place in the lists module, IMO. lists should concern itself only with lists (of anything) in general, not have special functions for items of a special type (that's my $0.02, anyway).

But I think tuplelists should be added before maplists, or together. Or, at least they should appear either together in the same OTP release, or tuplelists before maplists when in different releases. There is no real compelling reason to do it in this order. The need for maplists is probably greater than the need for tuplelists, as we have the lists:key* functions to deal with lists of tuples, but nothing to deal likewise with lists of maps. I guess it just feels better to first set what we already have right and then model new things on it than the other way round 🤔

zuiderkwast commented 3 years ago

Thanks for looking at the details.

There's one more function in lists which returns a value tuple: search(Pred, List) -> {value, Value} | false. Note the similarities with keysearch/3.

I don't think adding a separate tuplelists module is a good idea. People don't like unnecessary changes. For the same reason, I'd also prefer making maplists:keytake fully analogous to lists:keytake and not change the return tuple. The principle of least surprise.

Regarding the unnecessary value atom in the return tuple of keytake, {value, Tuple, TupleList}, what harm does it cause apart from being slightly larger in memory? {value, X, Y} is only one word larger than {X, Y} (keytake vs "keygrab") and there's no extra allocation. The difference in size between {value, X} vs X (keysearch vs keyfind) is larger. It's 3 words (https://erlang.org/doc/efficiency_guide/advanced.html#memory) and one extra allocation. So, is it really worth it replacing lists:keytake with keygrab (keysnitch? keystroke?) for this one word, considering deprecation and scheduling for removal, etc. the old one?

lhoguin commented 3 years ago

The value tuple is surprising. If we could eliminate it everywhere, then we wouldn't have to look up the documentation every time. And new modules like maps have done away with these old artifacts, so I don't think we should introduce more functions with them. Let's see what OTP team thinks.

Maria-12648430 commented 3 years ago

There's one more function in lists which returns a value tuple: search(Pred, List) -> {value, Value} | false. Note the similarities with keysearch/3.

keysearch was probably modeled on search.

In search, tagging the return value is necessary, otherwise you couldn't search for false in a list, as search would return false when false is contained as well as when it is not contained.

In keysearch however, tagging is not necessary. false will only be returned when the requested key is not found, otherwise the return value is a tuple.

This was probably the main reason for superseding keysearch with keyfind, not the size in memory. Also, search and keysearch have a discrepancy in their arguments, one takes a function and the other a key, which breaks their similarity.

I don't think adding a separate tuplelists module is a good idea. People don't like unnecessary changes. For the same reason, I'd also prefer making maplists:keytake fully analogous to lists:keytake and not change the return tuple. The principle of least surprise.

Look at it this way: with tuplelists, we can create a nice, fully consistent API for dealing with lists of specially tuples (or tuple elements in lists), and likewise with maplists for dealing with lists of specially maps (or map elements in lists), and the lists module can deal with lists of anything, whose functions make no assumption about the type and format of the elements.

Also, we can get rid of some peculiarities of the key* functions (comparing equal (vs matching) just doesn't sound right when it comes to keys).

Last but not least, in the new modules we can use a different naming convention that agrees more with (most of) the other modules. The lists module is an odd child here, in most other (younger) modules underscores are used to separate words in function names (eg maps:update_with), but lists concatenates them together (eg lists:zipwith). It looks like naming consistency across modules was not on the top of the list (pun not intended... ok, maybe 😁) of priorities back in the old days ;)

If, instead, maplists would be modeled on the current key* family of functions, it would repeat all their inconsistencies and peculiarities, more so if you try to make it 100% analogous like with the matter of value in the return value of keytake.

Instead of an unecessary change, I think of it as a replacement with something better. The old functions will stay, at least for a while (I know, competing standards, don't mention it ;)), but their documentation should contain notes like "better use tuplelists:... instead"). My expectation is that people will gradually flock to the new implementation simply because it is nicer.

We still have dict (and orddict and gb_trees) around 6 or 7 years after the advent of maps. You can still build anything you want today with dict. It's just so much nicer to use maps ;)

So, is it really worth it replacing lists:keytake with keygrab (keysnitch? keystroke?) for this one word

Not for this reason, no 😅

considering deprecation and scheduling for removal, etc. the old one?

Who said anything about removal, I didn't 😉

Seriously, I don't think they will be removed for a while. The key* functions are so old, probably used in a gadzillion places (grep -r 'lists:u\?key' * | wc -l in lib of OTP alone yields 3709).

Instead, I would put notes in the docs of these functions pointing to tuplelists for a while, and at some point remove them from the documentation. I wouldn't remove the functions themselves, though, not even deprecate them.

Let's see what OTP team thinks.

Yes, let's 😄

zuiderkwast commented 3 years ago

@Maria-12648430

In search, tagging the return value is necessary, otherwise you couldn't search for false in a list, as search would return false when false is contained as well as when it is not contained.

Of course I understand this. If we change this, we should probably also use nomatch (like binary:match/2,3 and re:run/2,3) instead of false, snake_case names, etc. It's about the balance between fixing things and keeping things that work.

Using match instead of equal is a good idea though, I think, for all these new functions. It can simply be documented in the top of the page, like it is for ordsets vs sets.

It looks like naming consistency across modules was not on the top of the list (pun not intended... ok, maybe :grin:) of priorities back in the old days ;)

I totally agree with this. I've heard mentions of creating OTP 2, where all these inconsistencies are fixed. I've also heard that many Erlang users don't want to go through what Python did in the transition from Python 2 to Python 3. (After 10 years, some libs and frameworks still only work with Python 2 even though support for this language version has been dropped long ago.) My perception is that Erlang users are more conservative than Python users, not to mention Ruby, Elixir, Rust. (Some of these conservative users are not very interested in EEPs at all because they have large old systems that just work.)

This EEP is not about OTP 2 though. It's only a small step towards adding Lenses, which is supposed to be the real solution and would allow things like pattern matching on map keys in list elements in some nested fancy way (if I understand correctly). But if the chosen way is to deprecate the lists:key* functions and replace them with a separate module with a different API, I'm all for it.

garazdawi commented 3 years ago

(My opinion follows, not the OTP team's)

The lists module is the most used module we have*. So I always argue that no matter how odd the API choices in that module may be, we should strive to mimic them as that creates consistency among APIs.

* perhaps the erlang module could give it a run for its money if we include auto-imported bifs

I've heard mentions of creating OTP 2, where all these inconsistencies are fixed.

I wonder if we could adapt XKCD-927 to talk about API inconsistencies instead :) No matter what we do, 10 years from now we'll sit there wondering why the original authors did not do a better job in designing the API... That of course deas not mean that we should not strive to make it as good as we can today!

garazdawi commented 3 years ago

(My opinion follows, not the OTP team's)

I'm missing a discussion about alternatives and why we do not want them in this EEP. Such as:

Documenting why we did not choose a solution is just as important as documenting why we did.

juhlig commented 3 years ago

I'm missing a discussion about alternatives and why we do not want them in this EEP

Ok, my 2 cents, coming right up ;)

* Allowing `lists:keyfind` to work on maps.

Wouldn't that add ambiguity to inconsistency?

(Remember that lists given to keyfind don't have to be lists of only tuples, other than what the documentation suggests. They can be lists of anything, keyfind will ignore items that are either too short tuples or no tuples at all.)

If the K argument to a maps-enabled keyfind(V, K, List) is not an integer it will only work on maps (with V being the value with the key K), but it will work on tuples (with V in the position K) and maps (with V being the value with the key K) if K is an integer.

* Adding `lists:mapfind`.

See https://github.com/erlang/otp/pull/4831#issuecomment-849615672

* What about adding it to the maps module?

I think the maps module should only be concerned with maps (of anything), as the lists module should only be concerned with lists (of anything; @Maria-12648430 said something similar before). The key* functions in lists already break that by working on items of a special type and format (tuples), they could break it again for maps, though I wouldn't like that. But functions that don't work on maps but lists of maps feel definitely out of place there.

* Why isn't `lists:search(fun(Map) -> maps:get(Key,Map) =:= Value end, ListOfMaps)` good enough?

For the same reason why lists:search(fun(Tuple) -> element(N, Tuple) =:= Key end, ListOfTuples) isn't (wasn't) good enough I guess ^^;

zuiderkwast commented 3 years ago

@garazdawi I can elaborate on the alternatives and why we do not want them. I think I know what to write. (Thx @juhlig for drafting some of it for me!)

juhlig commented 3 years ago

@zuiderkwast be my guest ;)))

Maria-12648430 commented 3 years ago
  • Allowing lists:keyfind to work on maps.

Wouldn't that add ambiguity to inconsistency?

Where is there an inconsistency in lists:keyfind though?

juhlig commented 3 years ago
  • Allowing lists:keyfind to work on maps.

Wouldn't that add ambiguity to inconsistency?

Where is there an inconsistency in lists:keyfind though?

"Lists of anything" vs "lists of tuples", you know ;)

Maria-12648430 commented 3 years ago
  • Allowing lists:keyfind to work on maps.

Wouldn't that add ambiguity to inconsistency?

Where is there an inconsistency in lists:keyfind though?

"Lists of anything" vs "lists of tuples", you know ;)

Ah, yes 👍

zuiderkwast commented 3 years ago

@garazdawi

I'm missing a discussion about alternatives and why we do not want them in this EEP.

Added.

Maria-12648430 commented 3 years ago

In search, tagging the return value is necessary, otherwise you couldn't search for false in a list, as search would return false when false is contained as well as when it is not contained.

Of course I understand this. If we change this, we should probably also use nomatch (like binary:match/2,3 and re:run/2,3) instead of false, snake_case names, etc. It's about the balance between fixing things and keeping things that work.

nomatch doesn't quite... chime, shall we say =^^= nomatch (and match for that matter) seems to be only used in modules searching in "strings", namely re and binary. In other modules I have looked, the usual return values indicating "not found" are error (or error tuples) or false.