apple / swift-collections

Commonly used data structures for Swift
Apache License 2.0
3.57k stars 271 forks source link

Dictionary protocol and conversions between dictionary types #293

Open JaapWijnen opened 11 months ago

JaapWijnen commented 11 months ago

Hey everyone! Two ideas for Dictionary types:

Let me know your thoughts! Would this be useful to you? Potential problems you envision or reasons to not do this in general? Would love to hear it, thanks!

These might also be interesting uses for the different types of Set that exist

lorentey commented 11 months ago

Yes! Now that we have Dictionary, OrderedDictionary, TreeDictionary (and soon SortedDictionary, which isn't quite ready for use yet), I think we have enough examples of dictionary types to start forming the protocol.

The protocol should ultimately live in the Standard Library, but we could prototype it within this package. (Transitioning it into the stdlib later is going to be painful, though.)

The tests include a couple of protocols to test that core Dictionary API is implemented consistently (DictionaryAPIChecker and DictionaryAPIExtras), but those are just for ensuring API parity, and they aren't suitable for use as DictionaryProtocol.

Design decisions:

Random notes:

JaapWijnen commented 11 months ago

Hey! Thanks for the thorough response.

The protocol should ultimately live in the Standard Library, but we could prototype it within this package. (Transitioning it into the stdlib later is going to be painful, though.)

What direct difficulties do you expect? I do see the general challenge of adding a protocol to such a fundamental type in the standard library. And perhaps there not being direct other types that would profit from this protocol in the stdlib.

Would DictionaryProtocol refine Sequence, or would it leave that up to each use case? (FWIW, DictionaryProtocol cannot refine Collection because OrderedDictionary isn't one.)

I would definitely say DictionaryProtocol would refine Sequence. Since we're not really interested in the way the elements are stored at the time of using a protocol, one of the big uses would be to be able to loop over the elements of the Dictionary I would think. Are there any drawbacks to refining Sequence? Just to double check by refining you mean the following?

protocol DictionaryProtocol: Sequence {}

Do we want to model the Keys and Values views as protocol requirements?

I would think so. I'm not too familiar with the other newer implementations but this does seem an interesting addition to the protocol. If we had a SetProtocol we could restrict our keys associated type to that too perhaps?

Should the protocol model a mutable Dictionary, or would it be better to allow read-only types to conform to it? (Do we want to spin off the mutations into a separate MutableDictionaryProtocol?) I think most API that would take a DictionaryProtocol would only need quick lookups, and it would be desirable to allow passing read-only dictionaries to them. (Such as a constant dictionary that implements perfect hashing, constructed at build time.)

This is interesting, it might be worthwhile keeping them seperate for a first pass and find out if there's clear benefits? They're not directly obvious to me. But in a way we would be mirroring Collection and MutableCollection which does make sense to me in terms of structuring. I'd say we there should be a clear reason to split them otherwise we're just adding complexity perhaps.

If mutations will be part of the protocol, do we want it to include any semantic guarantees on index invalidation? (TreeDictionary invalidates all indices on every mutation, including assignments via the Values view. If we don't change this, then that sort of settles the issue -- the protocol will need to follow suit.)

No direct ideas/opinions on this, but that is a good point to keep in mind.

For conversions between dictionaries, I think it would be reasonable for (the mutable variant of) the protocol to include a requirement for init(_ other: some DictionaryProtocol<Key, Value>).

Yeah especially if we would have something like OrderedDictionary's elements view in the protocol that should be easy to have a default implementation for perhaps?

mapValues/compactMapValues are interesting. They ought to be protocol requirements, because customizing their implementation enables significant improvements (by avoiding re-hashing or re-sorting keys, for example), but their shape isn't easily expressible in a protocol. Perhaps we could shoehorn them in as extension methods forwarding to a new init(mappingValuesOf dict: some DictionaryProtocol<Key, T>, by transform: (T) -> Value throws) rethrows requirement, which individual types can manually specialize for typeof(dict) == Self.self.)

Hm not sure about this I'd have to take another look!

What are your thoughts for something similar for Set? What are the current plans for this package? I see a lot of changes since the last tag! Any plans for a next version or is there a clear roadmap somewhere? How would you like to proceed? I wouldn't mind to get a first (probably basic) version up in an MR and we can discuss further?

JaapWijnen commented 10 months ago

Sorry taking a little longer than expected since I'm a little swamped with work :) I have a basic thing that might make sense to share soon so we have something tangible to discuss and improve on! I'm running into one inconsistency unfortunately which is that sometimes stdlib's dictionary uses a labeled or unlabelled tuple for its element in api's. Example: The standard library's Dictionary type defines an type alias Element = (key: Key, value: Value) But one initialiser for example has the following signature:

@inlinable
  public init<S: Sequence>(
    uniqueKeysWithValues keysAndValues: __owned S
  ) where S.Element == (Key, Value) {

whereas OrderedDictionary has:

@inlinable
  public init<S: Sequence>(
    uniqueKeysWithValues keysAndValues: S
  ) where S.Element == (key: Key, value: Value) { // or Element since it has a typealias

What are your thoughts on this @lorentey? I'd argue that the OrderedCollections implementation is more correct. But I don't see that std lib api changing. I don't imagine this would be accepted as a breaking change for Swift 6 for example.

Edit: Only later noticed that OrderedDictionary also contains a similar initialiser that takes an unlabelled tuple as Element. Maybe there's a difference between these initialisers that I don't fully comprehend?

JaapWijnen commented 9 months ago

Hey @lorentey ! Just wanted to check if you have any thoughts on the current PR I submitted. Would love to move this forward :)

JaapWijnen commented 9 months ago

Put some more thought into this and took a look at SortedDictionary. Unfortunately none of the initialisers there match with the Dictionary and OrderedDictionary ones. Making it hard to get an initialiser defined in the protocol itself. That in turn would make incorporating a mapping between dictionary types in the protocol hard to do too. Have not looked at the TreeDictionary type yet since unifying a protocol over SortedDictionary, OrderedDictionary and plain Dictionary is already proving quite hard.

SortedDictionary has:

OrderedDictionary has:

Which so we can't have a general initialiser in the protocol. Any thoughts on this?
One thought I had was that SortedDictionary in a way could also adopt the.init(uniqueKeysWithValues:) initialiser since technically it's keys are all unique they're just not required to have a unique hash like Dictionary or OrderedDictionary If you agree that works then .init(_ keysAndValues: uniquingKeysWith:) can also be included where the decision of wether to overwrite the existing value (as is done in the SortedDictionary.init(keysWithValues:)) or not would instead be handled by the uniquingKeysWith: (Value, Value) throws -> Value closure. This could be an interesting initialiser anyway when feeding an unordered sequence into SortedDictionary.init(keysWithValues:) since there's no guarantee for which key value will end up in the SortedDictionary for duplicate keys in the sequence.

Would love to hear your thoughts so I can move further on the implementation of this!