fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
929 stars 196 forks source link

Concurrency errors when expanding columns on a frame #340

Closed chrisharding closed 6 years ago

chrisharding commented 8 years ago

Our application has multiple concurrent threads each with a Deedle frame. Each frame contains objects of the same type with properties that are not primitive types. When two threads attempt to expand columns on their frame at the same time, we're seeing an "An item with the same key has already been added" exception. Running each thread in isolation works without error.

Debugging the Deedle code, I can see that the error is coming from getCachedCompileProjection in the Reflection module (FrameUtils.fs):

  /// Compile all projections from the type, so that we can run them fast
  /// and cache the results with Type as the key, so that we don't have to recompile 
  let getCachedCompileProjection = 
    let cache = Dictionary<_, _>()
    (fun typ ->
        match cache.TryGetValue(typ) with
        | true, res -> res
        | _ ->
            let res = [| for name, fldTy, proj in getMemberProjections typ -> 
                           name, fldTy, proj.Compile() |]
            cache.Add(typ, res)
            res )

It looks to me as though cache.TryGetValue(typ) is returning false for both threads (as the type hasn't been encountered before) and so both threads attempt to generate the member projections for the type and add them to the cache. The first thread succeeds and the second one fails with the expected exception that the type is already in the cache.

If I add a simple lock around the access to the cache, then this problem disappears:

  let getCachedCompileProjection = 
    let cache = Dictionary<_, _>()
    let monitor = Object()
    (fun typ ->
      lock monitor (fun () ->
          match cache.TryGetValue(typ) with
          | true, res -> res
          | _ ->
              let res = [| for name, fldTy, proj in getMemberProjections typ -> 
                             name, fldTy, proj.Compile() |]
              cache.Add(typ, res)
              res ))

Once past this issue, the same problem then happens in createTypedVector:

  /// Helper function used when building frames from data tables
  let createTypedVector : _ -> seq<OptionalValue<obj>> -> _ =
    let cache = Dictionary<_, _>()
    let monitor = Object()
    (fun typ -> 
      lock monitor (fun () ->
          match cache.TryGetValue(typ) with
          | true, res -> res 
          | false, _ -> 
              let par = Expression.Parameter(typeof<seq<OptionalValue<obj>>>)
              let body = Expression.Call(createTypedVectorMi.MakeGenericMethod([| typ |]), par)
              let f = Expression.Lambda<Func<seq<OptionalValue<obj>>, IVector>>(body, par).Compile()
              cache.Add(typ, f.Invoke)
              f.Invoke ))

Applying the same fix resolves this and our application then works fine.

I'm not at all familiar with the Deedle code design, so this simple approach may not be the best solution when considering concurrency and performance for Deedle in general, but something needs to be done to address this issue.

isobel-cullen commented 8 years ago

I think I hit this issue as well, is there any idea when the fix will be looked at / merged in?

zyzhu commented 6 years ago

Fixed in https://github.com/fslaborg/Deedle/pull/394