fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
924 stars 196 forks source link

Fix concurrency errors with ConcurrentDictionary #394

Closed zyzhu closed 5 years ago

zyzhu commented 5 years ago

Attempt to fix issue 340 in pull 341. I simply replaced dictionary with concurrent dictionary. Ping @tpetricek for review. https://github.com/fslaborg/Deedle/pull/341

kblohm commented 5 years ago

Hi, you might want to have a look at ConcurrentDictionary.GetOrAdd(key, Func).

zyzhu commented 5 years ago

@kblohm Thanks for the tip. Code is simplified a lot. BTW, @dsyme and I initiated some discussions about general data science in f# in the following thread. Any inputs would be more than welcome. https://github.com/fslaborg/FsLab/issues/137

kblohm commented 5 years ago

@zyzhu Since you are using the ConcurrentDictionary.GetOrAdd(TKey, TValue) overload and not ConcurrentDictionary.GetOrAdd(TKey, Func<TKey,TValue>), you are compiling your factory-lambda every single time you call the function, so your caching is not really doing anything now. This should probably be something along the lines of:

let createTypedVector : _ -> seq<OptionalValue<obj>> -> _ =
    let cache = ConcurrentDictionary<_, _ -> _>()
    fun typ ->
        let valueFactory t =   
            let par = Expression.Parameter(typeof<seq<OptionalValue<obj>>>)
            let body = Expression.Call(createTypedVectorMi.MakeGenericMethod([| t |]), par)
            Expression.Lambda<Func<seq<OptionalValue<obj>>, IVector>>(body, par).Compile().Invoke      
        cache.GetOrAdd(typ, Func<_,_>(valueFactory))

Also, while having a quick look at the module, there is a getCachedCompileProjection aswell that is using a Dictionary to cache.

zyzhu commented 5 years ago

@kblohm Thanks for pointing it out. I added back trygetvalue so that cache shall work. I also changed getCachedCompileProjection

kblohm commented 5 years ago

@zyzhu now you are not really solving the concurrency problem. Right now checking for the value and adding the value is not atomic, that is what GetOrAdd is for.

zyzhu commented 5 years ago

@kblohm Clearly I'm a newbie on concurrent dictionary. I learned about valuefactory today and fixed it along your lines. A dumb question. Why the cache stays inside createTypedVector function? Isn't the cache re-initiated everytime a thread calls this function?

zyzhu commented 5 years ago

@dsyme Thanks for review. I've made above changes.