aerospike / aerospike-client-csharp

Aerospike C# Client Library
70 stars 47 forks source link

Mono Crash #8

Closed kelindar closed 8 years ago

kelindar commented 9 years ago

I'm getting this error on Mono 4.0.1 running on Debian, haven't chaced where it comes exactly but could come from the index create. This is quite bad since it crashes the whole runtime with it. Ideas?

Stacktrace:

  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) object.__icall_wrapper_mono_object_isinst (object,intptr) <0xffffffff>
  at (wrapper stelemref) object.virt_stelemref_class (intptr,object) <0xffffffff>
  at Aerospike.Client.PartitionParser.DecodeBitmap (Aerospike.Client.Node,Aerospike.Client.Node[],int) <0x000e7>
  at Aerospike.Client.PartitionParser.ParseReplicasMaster (Aerospike.Client.Node) <0x001d7>
  at Aerospike.Client.PartitionParser..ctor (Aerospike.Client.Connection,Aerospike.Client.Node,System.Collections.Generic.Dictionary`2<string, Aerospike.Client.Node[][]>,int,bool) <0x00173>
  at Aerospike.Client.Cluster.UpdatePartitions (Aerospike.Client.Connection,Aerospike.Client.Node) <0x00067>
  at Aerospike.Client.Node.UpdatePartitions (Aerospike.Client.Connection,System.Collections.Generic.Dictionary`2<string, string>) <0x00167>
  at Aerospike.Client.Node.Refresh (System.Collections.Generic.List`1<Aerospike.Client.Host>) <0x000ef>
  at Aerospike.Client.Cluster.Tend (bool) <0x0018f>
  at Aerospike.Client.Cluster.Run () <0x0002b>
  at System.Threading.Thread.StartInternal () <0x000bf>
  at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) <0xffffffff>

Native stacktrace:

    mono() [0x4c097c]
    mono() [0x52652e]
    mono() [0x43627d]
    /lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0) [0x7fd315feb0a0]
    mono(mono_class_is_assignable_from+0x22) [0x534702]
    mono(mono_object_isinst+0x3d) [0x5c892d]
    [0x414ca1b8]
BrianNichols commented 9 years ago

The error occurs in the cluster tend background thread. This thread periodically polls servers for changes to data partition maps. The code looks correct. I'm not even sure what this means:

  at (wrapper managed-to-native) object.__icall_wrapper_mono_object_isinst (object,intptr) <0xffffffff>
  at (wrapper stelemref) object.virt_stelemref_class (intptr,object) <0xffffffff>
kelindar commented 9 years ago

I think this happened due to the one of 3 nodes in the cluster having no visibility in the cluster (visibility was "red" in AMC). Once I've rebooted that node, mono stopped crashing as well.

BrianNichols commented 9 years ago

We can't really do much here until we understand what mono is complaining about and if it's a bug in mono itself or not.

gregoryyoung commented 9 years ago

@BrianNichols I have seen similar before. It appears to be an unlikely problem that comes up sometimes in our stress testing (different stuff as well). It manifests itself in many ways (looks like heap/stack corruption). Related: https://bugzilla.xamarin.com/show_bug.cgi?id=18151

@Kelindar can you try setting MONO_GC_DEBUG=clear-at-gc I'm willing to guess the error goes away