agnicore / nfx

.NET Standard Unistack Framework
http://nfxlib.com
Other
75 stars 91 forks source link

Pile.Cache generating an avalanche of "Object reference not set to an instance of an object." #36

Open agnibos opened 6 years ago

agnibos commented 6 years ago

This happens on table.sweep and on table.fethExistingEntry - one the memory corrupts the error keeps generating on subsequent sweeps:

Sweep exception:

@20180120-022702|8216d599-fcbc-41ca-a4ff-45c4a68590fa|......zw02||
Critical|Data.Cache|LocalCache('MDBDataStore::GraphSystem').threadSpin().foreach.Sweep|0
Leaked exception while sweeping table 'GraphSystemService.Node': [System.NullReferenceException] Object reference not set to an instance of an object.'

  +-Exception 
  | Type      System.NullReferenceException
  | Source    NFX
  | Target    sweep
  | Message   Object reference not set to an instance of an object.
  | Stack     
     at NFX.ApplicationModel.Pile.LocalCacheTable`1._bucket.sweep()
     at NFX.ApplicationModel.Pile.LocalCacheTable`1.Sweep(Stopwatch timer, Int32 maxTimeMs)
     at NFX.ApplicationModel.Pile.LocalCache.threadSpin()

Access/Get Exception:

@20180120-022703|9602f37a-26cc-462b-95cd-d9fe04d05431|.......zw02||
Error|AppMgmt|Agni.Social.Graph.Server.GraphSystemService.GetNode|0
[System.NullReferenceException] Object reference not set to an instance of an object.

  +-Exception 
  | Type      System.NullReferenceException
  | Source    NFX
  | Target    fetchExistingEntry
  | Message   Object reference not set to an instance of an object.
  | Stack     
     at NFX.ApplicationModel.Pile.LocalCacheTable`1.fetchExistingEntry(_bucket bucket, TKey key, Int32 hashCode)
     at NFX.ApplicationModel.Pile.LocalCacheTable`1.Get(TKey key, Int32 ageSec)
     at NFX.ApplicationModel.Pile.CacheExtensions.FetchThrough[TKey,TResult](ICache cache, TKey key, String tblCache, ICacheParams caching, Func`2 fFetch, Func`3 fFilter)
     at Agni.Social.Graph.Server.GraphSystemService.DoGetNode(GDID gNode, ICacheParams cacheParams)
     at Agni.Social.Graph.Server.GraphSystemService.GetNode(GDID gNode)
itadapter commented 6 years ago

Yes, this happened 12/19, then today, it happens very infrequently a'la heisenbug - which makes me think that this is multi-threading related issue (improper locking/barrier/sequencing/race)

agnibos commented 6 years ago

See LocalCacheTable.cs#L161, the problem is that you can not use -1 as a flag, as the Age gets updated by thread all the time, so non-Chain entities get interpreted as "chain".

Why does this happen? Simple: clock drift. It returns negative time delta in future, this effectively sets Age to <0 which triggers IsChain==true but the typecast is not checked later, hence NULL REF

can not use this flag <1 maybe add another field?

itadapter commented 6 years ago

The _entry gobbles up ram like crazy, we have to be mindful with additional field creation as it makes these _entry[] bigger and bigger

But at least we have figured it out!

itadapter commented 6 years ago

Was fixed by 8c45548dc34b06293e7932141dd98118ee960b61 But needs more extensive testing. Keep issue open for now

agnibos commented 6 years ago

Guys, any news on this? Have not heard anything bad, close?

itadapter commented 6 years ago

The issues was resolved. Lets keep it open for another month just in case