andyferris / Dictionaries.jl

An alternative interface for dictionaries in Julia, for improved productivity and performance
Other
278 stars 28 forks source link

Fix `deepcopy`/`serialize` when used after `delete!` #112

Closed theogf closed 1 year ago

theogf commented 1 year ago

Hey @andyferris !

It looks my deepcopy and serialize implementation have a bug. I wrongly assumed that indices.values == collect(indices). It is visibly not true and create undef references when using non isbits structure.

I used collect(indices) instead and took care of updating the tests accordingly.

(version patch is bumped as well)

theogf commented 1 year ago

Oh no there are actually more issues with the values of the Dictionary itself

andyferris commented 1 year ago

I see the problem.

What we can do is take a shortcut if holes == 0. Otherwise we need to use the kind of logic you see in rehash!. I suggest one of the two following options:

  1. Just call rehash! if holes != 0 and then continue the current logic.
  2. Iterate the tokens and collect the results if holes != 0

The disadvantage of the first one is it won't be thread safe when you have multiple readers (I'm not even sure it is safe under single-threaded concurrency).

Probably should go with the second with code like:

serialize_type(s, T, false)
if indices.holes == 0
    return serialize(s, ind.values)
else
    is = Vector{I}(undef, length(dict))
    @inbounds for t in tokens(dict)
        is[i] = ind.values[t]
    end
    return serialize(s, is)
end

It might be even better if we could stream out the indices in the second branch rather than collecting them in memory, but at least this is safe.

Will need equivalent code for dictionaries, and for deepcopy.

theogf commented 1 year ago

So I was thinking of an even more straightforward solution with dispatching deepcopy_internal on the Dictionary and collecting the values themself.

Something like

deepcopy(d::Dictionary{I,T}) where{I,T} = Dictionary{I,T}(deepcopy(collect(keys(d))), deepcopy(collect(d))) 

This way we just build a whole new Dictionary and avoid shenanigans ensuring that the holes and slots are consistent?

theogf commented 1 year ago

So I just implemented my proposal

codecov[bot] commented 1 year ago

Codecov Report

Base: 78.86% // Head: 79.33% // Increases project coverage by +0.47% :tada:

Coverage data is based on head (ce7b08b) compared to base (7d596e8). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #112 +/- ## ========================================== + Coverage 78.86% 79.33% +0.47% ========================================== Files 20 20 Lines 2342 2347 +5 ========================================== + Hits 1847 1862 +15 + Misses 495 485 -10 ``` | [Impacted Files](https://codecov.io/gh/andyferris/Dictionaries.jl/pull/112?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Andy+Ferris) | Coverage Δ | | |---|---|---| | [src/Dictionary.jl](https://codecov.io/gh/andyferris/Dictionaries.jl/pull/112/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Andy+Ferris#diff-c3JjL0RpY3Rpb25hcnkuamw=) | `80.52% <100.00%> (+0.52%)` | :arrow_up: | | [src/Indices.jl](https://codecov.io/gh/andyferris/Dictionaries.jl/pull/112/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Andy+Ferris#diff-c3JjL0luZGljZXMuamw=) | `91.04% <100.00%> (+2.89%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Andy+Ferris). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Andy+Ferris)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

theogf commented 1 year ago

@andyferris Would you rather have me use your solution instead?

0x0f0f0f commented 1 year ago

@theogf news here?

theogf commented 1 year ago

Hi @andyferris, any update on this :) ?

andyferris commented 1 year ago

Sorry I have been smashed.

Correct is more important the fast, so lets merge this. Thanks @theogf

andyferris commented 1 year ago

https://github.com/JuliaRegistries/General/pull/69787