andyferris / Dictionaries.jl

An alternative interface for dictionaries in Julia, for improved productivity and performance
Other
278 stars 28 forks source link

Add `similar_type` #54

Open andyferris opened 3 years ago

andyferris commented 3 years ago

This adds a new similar_type type-level function for giving the return type of similar. It is modelled similarly to empty_type. There are fixups for both empty, similar and the constructors.

There is an expectation that insertable containers have a zero-argument constructor (for constructing an empty container) and that settable dictionaries have a 2-argument constructor of the form DictType(indices, undef). These are assumed if you implement empty_type or similar_type for your custom container.

In future I wish to add a third "factory" function like empty and similar for situations where both the keys and values are known. This is most useful for immutable dictionaries but the result does not have to be immutable - it's just a generic pattern for this case, that allow generic algorithms to be built upon. Such dictionaries would have two-argument constructors of the form DictType(indices, values) and there would be a matching type-domain function. (Note: for indices there would be a single argument constructor of the form IndType(indices)).

CC @c42f - I'm thinking this "new" unimplemented functionality is probably the "correct" way to do the StaticArrays pattern of similar_type and constructor requirements. I'm lost what to call it, though - placeholders for argument's sake would be new(old_container, new_indices, new_values) and new_type(OldType, NewKeyType, NewElType) but I'm not sure we can think of a better word than new? (Plus new wouldn't be callable from inner constructors, which would be a weird corner case to have).

(This is intended to close #51)

codecov[bot] commented 3 years ago

Codecov Report

Merging #54 (9c2e641) into master (26d1be0) will decrease coverage by 0.24%. The diff coverage is 37.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #54      +/-   ##
==========================================
- Coverage   74.00%   73.76%   -0.25%     
==========================================
  Files          19       19              
  Lines        1712     1738      +26     
==========================================
+ Hits         1267     1282      +15     
- Misses        445      456      +11     
Impacted Files Coverage Δ
src/AbstractIndices.jl 78.26% <ø> (-0.27%) :arrow_down:
src/Dictionary.jl 67.85% <ø> (+3.50%) :arrow_up:
src/FillDictionary.jl 58.33% <0.00%> (-19.45%) :arrow_down:
src/broadcast.jl 42.66% <0.00%> (-8.90%) :arrow_down:
src/map.jl 42.52% <0.00%> (-1.01%) :arrow_down:
src/pairs.jl 53.33% <0.00%> (-6.67%) :arrow_down:
src/reverse.jl 50.00% <0.00%> (-1.43%) :arrow_down:
src/filter.jl 63.02% <50.00%> (+0.09%) :arrow_up:
src/ArrayDictionary.jl 75.00% <83.33%> (+5.95%) :arrow_up:
src/AbstractDictionary.jl 84.12% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 26d1be0...9c2e641. Read the comment docs.

c42f commented 2 years ago

I'm thinking this "new" unimplemented functionality is probably the "correct" way to do the StaticArrays pattern of similar_type and constructor requirements. I'm lost what to call it, though - placeholders for argument's sake would be new(old_container, new_indices, new_values) and new_type(OldType, NewKeyType, NewElType) but I'm not sure we can think of a better word than new?

So I think I'm missing some context here to exactly understand what interface you're building here.

Are you saying that this new_type would be like the current thing we call StaticArrays.similar_type? And that StaticArrays.similar_type is kind of misnamed because it's not the type which is used in conjunction with similar(). Due to similar being assumed to return a mutable type, but similar_type not?

So you want another word like similar, with connotations that it might return something which is "even more similar", up to being immutable as well?

Generically, do we have a problem here of a cross product of possible traits which the "similar" output containers may or may not share with the template OldType? In that case it would be nice to have a single generic function similar(OldType, trait1, trait2, ...) where you don't need to dream up new names which are "kind of like similar" whenever you discover a new trait?

(Plus new wouldn't be callable from inner constructors, which would be a weird corner case to have).

Well I guess you could qualify it with the module, but needing to use Dictionaries.new would be a bit weird.

andyferris commented 2 years ago

Yes, exactly.

Except I’m imagining precisely three “factories”:

  1. Keys and values given - new (Or whatever)
  2. Keys given but values undef, to be filled in afterwards - similar
  3. Keys and values both not known and filled in afterwards - empty

There would be no traits involved beyond the fact that 2 is useless without mutability and 3 is useless without insertability, so you wouldn’t define the type-domain functions for those cases.

c42f commented 2 years ago

Yes, exactly

Ok cool :+1:

Except I’m imagining precisely three “factories”:

So these are the cases you think you need for the particular case of indexable containers (well, dictionaries) of which AbstractArray would be a subtype in principle? Hence the connection to StaticArrays?

Btw I don't actually want to involve traits. I only mention that thought because I'm unsure three classes of constructors really cover things (eg, looking at https://github.com/JuliaArrays/ArrayInterface.jl there's a disturbing variety of special purpose traits).

andyferris commented 2 years ago

Yeah. My thoughts are:

When you NEED certain traits then you are going to have to use less generic code. The fact is that there is always more traits to be created that you don't know about. I'm not sure a make_container(big_bag_of_desired_traits, ...) is ever going to be practical?

If you just want to write generic code that builds "a container" which holds arbitrary data I think those 3 options will cover the vast majority of cases. Currently similar and empty (well, Base.empty_mutable) work great but don't support immutable containers, and our generic code (outside StaticArrays) always uses setindex! or push! or whatever afterwards. I think in many of these cases a "generic" constructor/factory that takes a generator or similar would suffice (if such a generic constructor/factory existed) and would cover a lot of the non-linear-algebra code in StaticArrays for example.

I'm unsure three classes of constructors really cover things (eg, looking at https://github.com/JuliaArrays/ArrayInterface.jl there's a disturbing variety of special purpose traits).

I believe all those examples are all outside the scope of "I want generic code that populates containers with arbitrary values". These are cases where you want non-generic code. Or generic code that is more specialized than "I've got a container".

andyferris commented 2 years ago

The operation is a bit like collect(T, values). Except the T is a container you want to be similar to, not the element type, and it's a value. Perhaps collect_similar(old_container, new_keys, new_values) for dictionaries makes sense (as well as two-argument forms for sets and arrays)?

c42f commented 2 years ago

If you just want to write generic code that builds "a container" which holds arbitrary data I think those 3 options will cover the vast majority of cases.

Could well be true. I guess I don't have a strong opinion one way or another about this. It should be cleaner in StaticArrays to call similar_type less and instead to call new in most situations.

Though typically in StaticArrays, you tend to have values (in a tuple) but not keys; the keys are kind of implicitly deduced from OldType? Is it consistent to allow new(OldType, values)? Or maybe if StaticArrays was more serious about supporting nontrivial axes this would no longer be true?

collect_similar seems ok. construct would be good if it wasn't so overloaded already. There's also assemble which is kind of nice as it assembles the keys together with the values? Or possibly gather, but that's got other strong-and-unrelated connotations in distributed computing.

andyferris commented 2 years ago

Arrays are interesting. The keys are sometimes unnecessary, right? Though you might want something for offset arrays. And higher dimensional arrays. And static arrays (the sizes are specified in similar_type in StaticArrays, which would be collect_similar_type here, and be passed to from collect_similar...). In the end, it almost feels like Vector, where the indices are implied, is the odd one out.

Indices (and UnitRange and CartesianIndices and so-on) could use two-argument forms of collect_similar safely.

andyferris commented 2 years ago

Could well be true. I guess I don't have a strong opinion one way or another about this.

Yeah, me neither. It just seemed like immutable container support was always missing, and unnecessarily so... I'm thinking of this more as an experiment at this point.

c42f commented 2 years ago

In the end, it almost feels like Vector, where the indices are implied, is the odd one out.

StaticArray subtypes and Vector seem similar in that the type implies the indices? But yes it seems there's more cases where you need a value for the indices and can't infer something sensible sensible from the container type plus an iterator of values.

To add to the list of function names which are sort of like what you want here... there's also Base.copy() which doesn't guarantee returning the same type. Eg the behavior for copy(A::Transpose).

c42f commented 2 years ago

Keno has just updated the ImmutableArray PR in Base, and has mentioned generalizing array APIs once again. So this PR seems quite timely in that respect. Ref:

https://github.com/JuliaLang/julia/pull/41777