Odd threading behavior - Thread.CurrentThread.ManagedThreadId != Environment.ManagedThreadId

christianscheuer commented 6 years ago

I'm getting a The write lock is being released without being held exception when exiting write mode on a ReaderWriterLockSlim on macOS.

Debugging this issue has led me to conclude, that apparently my call toExitWriteLock happens on a different thread than where I started, albeit in a weird way.

When I debug it, Thread.CurrentThread.ManagedThreadId will indicate I'm still on thread 1 where I acquired the lock, but Environment.ManagedThreadId which is what ReaderWriterLockSlim uses, indicates I'm on thread 6. This does not seem to happen on CoreCLR.

The code paths involved do not include any calls to Thread.Start or asynchronous Task code. It really all should be happening on the same thread (which Thread.CurrentThread.ManagedThreadId also seems to indicate is true). The rest of the app does have asynchronous Task code though.

Unfortunately, I haven't yet been able to make a repro case. It does seem to make a difference though if GC is invoked between the calls to EnterWriteLock and ExitWriteLock. Any ideas as how to better debug this?

This is basically what I'm doing, but I don't think this helps much.

using System;
using System.Collections.Generic;
using System.Threading;

namespace Test
{

    public struct WriteLock : IDisposable
    {
        private readonly LockManager lockManager;

        internal WriteLock(LockManager lockManager)
        {
            this.lockManager = lockManager;
        }

        public void Dispose()
        {
            this.lockManager.ReleaseWriteLock();
        }
    }

    public class LockManager : IDisposable
    {
        private ReaderWriterLockSlim readerWriterLock;

        public LockManager()
        {
            this.readerWriterLock = new ReaderWriterLockSlim(LockRecursionPolicy.NoRecursion);
        }

        public IDisposable AcquireWriteLock()
        {
            // Enter the lock
            if (!this.readerWriterLock.TryEnterWriteLock(-1))
            {
                throw new Exception("Could not enter write lock");
            }
            Console.WriteLine("Entered write lock: " + Thread.CurrentThread.ManagedThreadId);

            // Return a new instance of a write lock
            return new WriteLock(this);
        }

        public void Dispose()
        {
            this.Dispose(true);
            GC.SuppressFinalize(this);
        }

        internal void ReleaseReadLock()
        {
            this.readerWriterLock.ExitReadLock();
        }

        internal void ReleaseWriteLock()
        {
            this.readerWriterLock.ExitWriteLock();
        }

        protected virtual void Dispose(bool disposing)
        {
            Console.WriteLine("Disposing lock manager... disposing: " + disposing);
            if (disposing)
            {
                if (this.readerWriterLock != null)
                {
                    this.readerWriterLock.Dispose();
                    this.readerWriterLock = null;
                }
            }
        }   

        public static void Main(string[] args)
        {
            using (var lockMan = new LockManager())
            using (var locky = lockMan.AcquireWriteLock())
            {
                //Lots of stuff happening in here. No Task awaits though...
            }
        }
    }

}

christianscheuer commented 6 years ago

I added manual tracing code to my code equal to the following:

Console.WriteLine("Step 1: " + (Thread.CurrentThread.ManagedThreadId == Environment.CurrentManagedThreadId));

and was able to verify that this changes from True to False. This happens consistently in my code on every run, but at different places. In one instance it changes after a new Dictionary<...>, which leads me to suspect GC is involved. In that repro, no other threading or task related code is run.

It consistently works on CoreCLR (stays True).

christianscheuer commented 6 years ago

I added a while(true) in my Main so that I could sample the process in Activity Monitor. This is how it looks after the Environment.CurrentManagedThreadId change. It looks like my Main method (after returning from the code that because of GC or something else makes the Environment.CurrentManagedThreadId change) is still actually running on the main thread, indicating that Thread.CurrentThread.ManagedThreadId is right by saying it's still 1, whereas Environment.CurrentManagedThreadId is wrong (now saying 2 not 6 as mentioned in the OP because I simplified the project to make a cleaner repro).

Call graph:
    2465 Thread_1605348   DispatchQueue_1: com.apple.main-thread  (serial)
    + 2465 start  (in libdyld.dylib) + 1  [0x7fffcd7c0235]
    +   2465 main  (in sfbackend) + 169  [0x100071c69]  main.cpp:376
    +     2465 __managed__Main  (in sfbackend) + 268  [0x1009faaa0]
    +       2465 sfbackend_SoundFlow_Shortcuts_Program__Main  (in sfbackend) + 404  [0x10060b8f8]
    2465 Thread_1605351
    + 2465 start_wqthread  (in libsystem_pthread.dylib) + 13  [0x7fffcd9d907d]
    +   2465 _pthread_wqthread  (in libsystem_pthread.dylib) + 1023  [0x7fffcd9d948e]
    +     2465 __workq_kernreturn  (in libsystem_kernel.dylib) + 10  [0x7fffcd8ef44e]
    2465 Thread_1605355
      2465 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fffcd9d908d]
        2465 _pthread_start  (in libsystem_pthread.dylib) + 286  [0x7fffcd9d9887]
          2465 _pthread_body  (in libsystem_pthread.dylib) + 180  [0x7fffcd9d993b]
            2465 FinalizerStart(void*)  (in sfbackend) + 69  [0x100073885]  FinalizerHelpers.cpp:65
              2465 ProcessFinalizers  (in sfbackend) + 42  [0x100739f7e]
                2465 RhpWaitForFinalizerRequest  (in sfbackend) + 55  [0x1000739c7]  PalRedhawkFunctions.h:172
                  2465 UnixEvent::Wait(unsigned int)  (in sfbackend) + 331  [0x1000bd74b]  PalRedhawkUnix.cpp:329
                    2465 _pthread_cond_wait  (in libsystem_pthread.dylib) + 712  [0x7fffcd9da7fa]
                      2465 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fffcd8eebf2]

jkotas commented 6 years ago

Environment.CurrentManagedThreadId is simple method - it just calls ManagedThreadId.GetCurrentThreadId() that just reads value from static. It compiles into a series of pointer dereferences. You can try single stepping through it in disassembly when it is returning good value, and then single stepping through again when it is returning wrong value. Comparing the two traces may give you clues what's wrong.

It is likely a problem with GC reporting again. Same as https://github.com/dotnet/corert/issues/5922#issuecomment-396008232 . It may be even the same underlying problem.

christianscheuer commented 6 years ago

Thanks, @jkotas .

Stepping through the good and the bad case, it seems to take the same code paths. (I'm not hitting the Environment.CurrentManagedThreadId the first time it's changed, so not sure how much it's gonna help). But the following thread static call in ManagedThreadId simply returns 2 in the bad case , 1 in the good case (after derefencing eax+0x10, ie. getting access to the ): __GetThreadStaticBase_S_P_CoreLib_System_Threading_ManagedThreadId

I'm thinking that maybe the finalizer is called on ManagedThreadId so the Id gets recycled.. How can I add a breakpoint to RecycleId or the finalizer? If I do breakpoint set -r ManagedThreadId I don't seem to catch the finalizer nor the RecycleId (I'm guessing it's inlined). This is with false in the csproj.

sfbackend`S_P_CoreLib_System_Threading_ManagedThreadId__get_Current:
    0x1001698d8 <+0>:  pushq  %rbp
    0x1001698d9 <+1>:  subq   $0x10, %rsp
    0x1001698dd <+5>:  leaq   0x10(%rsp), %rbp
    0x1001698e2 <+10>: xorl   %eax, %eax
    0x1001698e4 <+12>: movl   %eax, -0x4(%rbp)
    0x1001698e7 <+15>: callq  0x10000a3e9               ; __GetThreadStaticBase_S_P_CoreLib_System_Threading_ManagedThreadId
    0x1001698ec <+20>: movl   0x10(%rax), %eax
->  0x1001698ef <+23>: movl   %eax, -0x4(%rbp)
    0x1001698f2 <+26>: cmpl   $0x0, -0x4(%rbp)
    0x1001698f6 <+30>: jne    0x100169904               ; <+44>
    0x1001698f8 <+32>: callq  0x100169930               ; S_P_CoreLib_System_Threading_ManagedThreadId__MakeForCurrentThread
    0x1001698fd <+37>: nop    
    0x1001698fe <+38>: leaq   (%rbp), %rsp
    0x100169902 <+42>: popq   %rbp
    0x100169903 <+43>: retq   
    0x100169904 <+44>: movl   -0x4(%rbp), %eax
    0x100169907 <+47>: leaq   (%rbp), %rsp
    0x10016990b <+51>: popq   %rbp
    0x10016990c <+52>: retq   
    0x10016990d <+53>: nop    
    0x10016990e <+54>: nop    
    0x10016990f <+55>: nop

christianscheuer commented 6 years ago

I was able to set a breakpoint on the ManagedThreadId constructor (not sure why many of these SPCorelib function names are not found in lldb, had to use external disassembler).

So we can see MakeForCurrentThread gets called again, which leads to the wrong id assignment. Debugging it, I found out that the call to: __GetThreadStaticBase_S_P_CoreLib_System_Threading_ManagedThreadId

returns a new address for the Thread Static Base when the error is occuring. The thread static base + 0x10 (I'm guessing this is t_currentManagedThreadId) is 0 in the new thread static base, which then leeds to the call to MakeForCurrentThread.

Not sure how internals of tls are implemented in CoreRT. Please let me know if I can help further.

This is my backtrace when the ManagedThreadId gets recreated:

thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 5.1
  * frame #0: 0x00000001001699ac sfbackend`S_P_CoreLib_System_Threading_ManagedThreadId___ctor
    frame #1: 0x0000000100169952 sfbackend`S_P_CoreLib_System_Threading_ManagedThreadId__MakeForCurrentThread + 34
    frame #2: 0x00000001001698fd sfbackend`S_P_CoreLib_System_Threading_ManagedThreadId__get_Current + 37
    frame #3: 0x00000001001f6d2d sfbackend`S_P_CoreLib_Internal_Runtime_Augments_EnvironmentAugments__get_CurrentManagedThreadId + 9
    frame #4: 0x000000010008ee0d sfbackend`System_Runtime_Extensions_System_Environment__get_CurrentManagedThreadId + 9

christianscheuer commented 6 years ago

I tested with a debug build of CoreRT (same issue as in release build) and then tried calling RhpEnableConservativeStackReporting() where CPPCODEGEN would've done it. That unfortunately does not fix the issue, gives exact same behavior. So I don't know if that rules out GC as a root cause, or if even conservative stack reporting can have an influence on the thread static base being wrong.

christianscheuer commented 6 years ago

In the GetThreadStaticBaseForType helper, I found out that the code always takes the path as shown (eg. the early return storage[typeTlsIndex]), so the issue does not come from the TSS being re allocated in this method.

The RhPrintInt is just a helper method I put in to be able to log stuff to the console.

            int moduleIndex = pModuleData->ModuleIndex;
            object[] storage = (object[])RuntimeImports.RhGetThreadStaticStorageForModule(moduleIndex);

            // Check whether thread static storage has already been allocated for this module and type.
            if ((storage != null) && (typeTlsIndex < storage.Length) && (storage[typeTlsIndex] != null))
            {
                if (typeTlsIndex == 7) //managedthreadid
                {
                    object st = storage[typeTlsIndex];
                    RuntimeImports.RhPrintInt(typeTlsIndex);
                }

                return storage[typeTlsIndex];
            }

christianscheuer commented 6 years ago

Was able to set a watchpoint on the address in memory where the original value of t_currentManagedThreadId was stored. Since that is stored in what looks like a managed object (the object[] ), I think it's likely that this object gets moved during GC but something goes wrong in that process - since subsequent reads of the t_currentManagedThreadId member in ManagedThreadId's ThreadStaticBase returns 0. Really out of my comfort zone here though. Here's the stacktrace from the watchpoint write.

thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint 1
  * frame #0: 0x000000010003668c sfbackend`WKS::memcopy(dmem="\x04", smem="\x04", size=96) at gc.cpp:1846
    frame #1: 0x00000001000c81fb sfbackend`WKS::gc_heap::compact_plug(unsigned char*, unsigned long, unsigned int, WKS::gc_heap::compact_args*) [inlined] WKS::gc_heap::gcmemcopy(dest="\x10\x94\x87", src="\x10\x94\x87", len=128, copy_cards_p=YES) at gc.cpp:24464
    frame #2: 0x00000001000c8056 sfbackend`WKS::gc_heap::compact_plug(plug="\x10\x94\x87", size=128, check_last_object_p=NO, args=0x00007fff5fbfbf50) at gc.cpp:24561
    frame #3: 0x00000001000c9b67 sfbackend`WKS::gc_heap::compact_in_brick(tree="\x90Ԁ", args=0x00007fff5fbfbf50) at gc.cpp:24691
    frame #4: 0x00000001000c981c sfbackend`WKS::gc_heap::compact_in_brick(tree="@/\x80", args=0x00007fff5fbfbf50) at gc.cpp:24662
    frame #5: 0x00000001000c9e27 sfbackend`WKS::gc_heap::compact_in_brick(tre    frame #5: 0x00000001000c9e27 sfbackend`WKS::gc_heap::compact_in_brick(tre    frame #5: 0x00000001000c9e27 sfbackend`WKS::gc_heap::compact_in_brick(tre    frame #5: 0x00000001000c9e27 sfbackend`WKS::gc_heap::compact_in_brick(tree="\x10\x94\x87", args=0x00007fff5fbfbf50) at gc.cpp:24706
    frame #6: 0x00000001000c9e27 sfbackend`WKS::gc_heap::compact_in_brick(tree="?\x9c\x87", args=0x00007fff5fbfbf50) at gc.cpp:24706
    frame #7: 0x00000001000c981c sfbackend`WKS::gc_heap::compact_in_brick(tree="\x10\x94\x87", args=0x00007fff5fbfbf50) at gc.cpp:24662
    frame #8: 0x00000001000b9131 sfbackend`WKS::gc_heap::compact_phase(condemned_gen_number=1, first_condemned_address="\x90b\x8a", clear_cards=NO) at gc.cpp:24848
    frame #9: 0x000000010008174f sfbackend`WKS::gc_heap::plan_phase(condemned_gen_number=1) at gc.cpp:22631
    frame #10: 0x000000010006fbe5 sfbackend`WKS::gc_heap::gc1() at gc.cpp:15273
    frame #10: 0x000000010006fbe5 sfbackend`WKS::gc_heap::gc1() at gc.cpp:15273
    frame #11: 0x000000010008e3bf sfbackend`WKS::gc_heap::garbage_collect(n=0) at gc.cpp:16861
    frame #12: 0x000000010005ff40 sfbackend`WKS::GCHeap::GarbageCollectGeneration(this=0x0000000101704440, gen=0, reason=reason_alloc_soh) at gc.cpp:35196
    frame #13: 0x0000000100065380 sfbackend`WKS::gc_heap::try_allocate_more_space(acontext=0x0000000101707150, size=24, gen_number=0) at gc.cpp:13074
    frame #14: 0x0000000100065543 sfbackend`WKS::gc_heap::allocate_more_space(acontext=0x0000000101707150, size=24, alloc_generation_number=0) at gc.cpp:13370
    frame #15: 0x00000001000e8434 sfbackend`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) at gc.cpp:13401
    frame #16: 0x00000001000e8300 sfbackend`WKS::GCHeap::Alloc(this=0x0000000101704440, context=0x0000000101707150, size=24, flags=0) at gc.cpp:34542
    frame #17: 0x0000000100016be3 sfbackend`::RhpGcAlloc(pEEType=0x0000000100807f30, uFlags=0, cbSize=24, pTransitionFrame=0x00007fff5fbfe180) at gcrhenv.cpp:273
    frame #18: 0x00000001001134dd sfbackend`RhpNewObject at AllocFast.S:85
    frame #19: 0x00000001003ed68f sfbackend`S_P_CoreLib_System_Span_1<Char>___ctor + 91

This is with a call to RhpEnableConservativeStackReporting(); in static int InitializeRuntime in main.cpp.

jkotas commented 6 years ago

I think you are getting closer to find the root cause of the problem!

The storage for thread statics is created by Thread::SetThreadStaticStorageForModule. This method allocates the GC handle using RhpHandleAlloc that is supposed to keep the Object[] alive and that is supposed to be update by the GC when the Object[] moves.

It looks like that there is something wrong with this handle. Could you please check whether it got updated by the GC when the Object[] moved?

One possible explanation of the problem is that the handle is freed prematurely somehow. You can try adding tracing to Thread::SetThreadStaticStorageForModule and to RhHandleFree to check for this possibility.

christianscheuer commented 6 years ago

@jkotas thanks, great that I'm not completely off path at least :) I got to the same conclusion. It definitely seems like the RhpHandleAlloc call has no effect. I tried commenting out the RhHandleFree calls, and tried changing the handle type being allocated to a pinned reference. I even tried pinning it via a second call in the ThreadStatics.cs. I also tried calling RhHandleGet in favor of the Object** dereference, but all turned out the same way. Any further ideas? I'll definitely try to add some tracing as well in the C++ code instead of having to do it all in lldb, might reveal more info quicker that way.

christianscheuer commented 6 years ago

For the handle being updated, I am pretty sure it gets updated, since one of my initial findings was that the address had changed. But maybe the handle is updated to point to a wrong place. Should I add tracing around the place in gc/objecthandle where the handle is supposed to be updated to point to the moved object? Also I was wondering, is it possible that it's the objects being pointed to by the object[] that get wrongly moved / not updated?

jkotas commented 6 years ago

It is possible.

All object reference updates when the GC moves things around go through GCHeap::Relocate so you can try adding a logging there.

It would be best to have the stresslog for this that I have mentioned on the other issue. All interesting places in GC where the objects are scanned and updated are instrumented using stresslog. GCHeap::Relocate is instrumented using it as well. If you prefer debugging using printf, you may be able replace the STRESS_LOG macros with printfs in your local build.

christianscheuer commented 6 years ago

Wonderful, thanks @jkotas. I wasn't against using the stress log, just never got around to wrapping my head around it. The instructions for the stress log mention sos on Windows. How would I go about dumping it from lldb on macOS? Or would you say it would be easier to simply alter the macros to printf?

christianscheuer commented 6 years ago

Never mind, printf was easy! I'll attach a log as soon as possible.

jkotas commented 6 years ago

dumping it from lldb on macOS

lldb that comes with XCode cannot load the sos plugin. It is necessary to build lldb from sources: https://github.com/dotnet/coreclr/blob/master/Documentation/building/debugging-instructions.md#debugging-coreclr-on-os-x

Never mind, printf was easy! I'll

👍

christianscheuer commented 6 years ago

I switched out some STRESSLOG macros with printf, added some instrumentation here and there switched on conservative stack reporting and added a few pinning handles that never get freed as part of my tests. See complete delta here: https://github.com/christianscheuer/corert/tree/stresslog

The repro code itself is a recursive text analyzing algorithm. It is instrumented with calls that check if Console.WriteLine("PEM.AddNodeItemEntry 1: " + (Thread.CurrentThread.ManagedThreadId == Environment.CurrentManagedThreadId)); The first static part is just an identifier that tracks in which method the trace is from. So the log contains a lot of "...: True" until it suddenly becomes "...: False". That is where it's interesting.

Furthermore I have added some tracing to S/GetThreadStaticStorageForModule so we can track the handle. The last good state is right before a generation 2 GC at line 834681 in the log.

GetThreadStaticStorageForModule 421363 threadStaticsStorageHandle = 0x102c32848
GetThreadStaticStorageForModule 421363 *threadStaticsStorageHandle = 0x10f01e1f0

the 421363 is the native thread id.

Under the GcScanHandles (Promotion Phase = 1) phase the following is printed:

    GCHeap::Promote: Promote GC Root *0x102c32848 = 0x10f01e1f0 MT = 0x100876f00T

so seems like the handle does get promoted (it also seems to be present in earlier GC scans).

But then a little later in line 835765 it seems like my code is still running while the GC is running (root issue maybe), and that it transitions from True to False within that (ie. maybe the thread is not correctly hijacked/paused). Not sure if there's some contention about getting access to the console out and if that plays in.

 GCHeap::Promote: Promote GC Root *0x102c328f0 = 0x10efc8908 MT = 0x10089b720T
    GCHeap::Promote: Promote GC Root *0x102c328f8 = 0x10efc88e8 MT = 0x10089b720T
    GCHeap::Promote: Promote GC Root *0x102c32900 = 0x10eIndexItemCharacter 1: True
IndexItemCharacter 2: False
fc88d0 MT = 0x10089b698T
    GCHeap::Promote: Promote GC Root *0x102c32908 = 0x10efc88b8 MT = 0x10089b698T
    GCHeap::Promote: Promote GC Root *0x102c32910 = 0x10efc88a0 MT = 0x10089b698T
    GCHeap::Promote: Promote GC Root *0x102c32918 = 0x10efc8888 MT = 0x10089b698T
    GCHeap::Promote: Promote GC Root *0x102c32920 = 0x10efc8870 MT = 0x10089b698T

The end of this GC reads:

GC Heap 0x0
    Generation 2 [0x10efc3000, 0x0] cur = 0x0
        Segment mem 0x100720008 alloc = 0x100763f78 used 0x100763f78 committed 0x100763f78
        Segment mem 0x10efc3000 alloc = 0x10f177380 used 0x10fd3d5b0 committed 0x10fd44000
    Generation 1 [0x10f149108, 0x0] cur = 0x0
        Segment mem 0x10efc3000 alloc = 0x10f177380 used 0x10fd3d5b0 committed 0x10fd44000
    Generation 0 [0x10f177368, 0x0] cur = 0x0
        Segment mem 0x10efc3000 alloc = 0x10f177380 used 0x10fd3d5b0 committed 0x10fd44000
========== ENDGC 102 (gen = 2, collect_classes = 0) ===========}
GetThreadStaticStorageForModule 421363 threadStaticsStorageHandle = 0x102c32848
GetThreadStaticStorageForModule 421363 *threadStaticsStorageHandle = 0x10f01e1f0
GetThreadStaticStorageForModule 421363 threadStaticsStorageHandle = 0x102c32848
GetThreadStaticStorageForModule 421363 *threadStaticsStorageHandle = 0x10f01e1f0
GetThreadStaticStorageForModule 421363 threadStaticsStorageHandle = 0x102c32848
GetThreadStaticStorageForModule 421363 *threadStaticsStorageHandle = 0x10f01e1f0
GetThreadStaticStorageForModule 421363 threadStaticsStorageHandle = 0x102c32848
GetThreadStaticStorageForModule 421363 *threadStaticsStorageHandle = 0x10f01e1f0
GetThreadStaticStorageForModule 421363 threadStaticsStorageHAddNodeItem start: False
AddNodeItem 1: False
AddNodeItem 2: False
OnItemWordIndexed start: False
OnItemWordIndexed 2: False

What we can also see here is that the handle's address is not updated (!)

That handle was originally alloc'ed in line 4027, and no calls to RhHandleFree or DestroyHandle for that handle occur in the log. Initial alloc line 4026:

CreateHandle: 0x102c32848, type=2
SetThreadStaticStorageForModule 421363: calls RhpHandleAlloc. Handle = 0x102c32848, pStorage = 0x10f01e1f0
SetThreadStaticStorageForModule 421363: calls RhHandleFree on 0x102c32850

Here's a link to the complete, raw log https://drive.google.com/open?id=1rjjA0JHWWSs3J1fn4LQTnEWeveb0XtPe

Please let me know if I should make any adjustments to this in order to reveal more information.

jkotas commented 6 years ago

it seems like my code is still running while the GC is running

I think it is just a problem with output buffering. You can try to add fflush(stdout) at the end of the GC and see whether it goes away.

So far, the simplest explanation for the symptoms is that some part of the thread static object graph is not getting marked by the GC and the GC considers it garbage. Here are some ideas what to try:

Turn on heap verification by export RH_HeapVerify=1. This will turn on GC heap integrity checks before and after the GC. I expect that this integrity check should catch this problem even faster than your tracing. It would be useful to confirm it.
Add extra tracing to GC to verify that the whole graph from the GCHandle is getting marked correctly. Add a line like printf ("*%p*, pMT=%p, newly marked: %d\n", (size_t)o, *(void**)o, marked); to BOOL gc_heap::gc_mark1 (uint8_t* o) method in gc.cpp. When I have done this in a simple program that just prints Environment.CurrentManagedThreadId, I see that the thread static object graph is getting marked correctly:

*00000218D2746B00*, pMT=00007FF62C0E6491, newly marked: 1 // pMT is __Array<Object>::`vftable'
*00000218D2746B60*, pMT=00007FF62C0EEEC9, newly marked: 1 // pMT is _GCStaticEEType_01::`vftable'
*00000218D2746BA8*, pMT=00007FF62C09D051, newly marked: 1 // pMT is S_P_CoreLib_System_Threading_ManagedThreadId::`vftable':
GCHeap::Promote: Promote GC Root *00000218EABC2218 = 00000218D2746B00 MT = 00007FF62C0E6490T <- this is the threadStaticsStorageHandle GCHandle

Could you please find out which link of the chain from the GCHandle to the ManagedThreadId object is broken after the GC? The chain should be: GCHandle -> object[] -> GCStaticEEType -> System.Threading.ManagedThreadId. You can do this by setting a breakpoint at RhGetThreadStaticStorageForModule and stepping through the code. When the GCHandle is fetched, you should see get object with object[] EEType. When the element of this array is fetched, you should get object with GCStaticEEType. When the field of this object is fetched, you should get object with ManagedThreadId.
Revert the change that adds RhpEnableConservativeStackReporting. We know that it is not helping, and it just adds a lot of clutter to the stack scanning.
It is interesting that the offending GC is the first GC with "requested generation = 2". You may be able to use this condition to turn on verbose tracing to reduce the size of the log if it becomes too hard to manage.

Thanks for helping us to trace this problem down!

christianscheuer commented 6 years ago

Thanks for keeping with me, @jkotas ! :)

problem with output buffering. You can try to add fflush(stdout)

You're right, it was just the output buffering. fflush helped confirm that. I introduced flushing after most of the logging statements now, makes many things clearer now.

Revert the change that adds RhpEnableConservativeStackReporting

Done

t is interesting that the offending GC is the first GC with "requested generation = 2".

I thought this too, but with the changes and new runs, it now happens first time in a gen = 1 GC.

export RH_HeapVerify=1

I'm not sure I'm seeing the effects of this. Tried both running this in Terminal before building ILC and before invoking dotnet publish -r osx-x64, doesn't seem to make much of a difference in the output I'm getting. I haven't gotten around to stepping through and adding the more low level tracing yet. But did another log with these new parameters. Log file here: https://drive.google.com/file/d/1ukeZfhQYdWRt4SkVTpbTebudziC_ZCDE/view?usp=sharing

What strikes me now is this:

First we get a handle at 0x101732850 pointing to 0x1114fffe0

CreateHandle: 0x101732850, type=2
SetThreadStaticStorageForModule 100384: calls RhpHandleAlloc. Handle = 0x101732850, pStorage = 0x1114fffe0

Then it gets destroyed and a new one is allocated 0x101732848->0x111534d70:

SetThreadStaticStorageForModule 100384: calls RhpHandleAlloc. Handle = 0x101732848, pStorage = 0x111534d70
SetThreadStaticStorageForModule 100384: calls RhHandleFree on 0x101732850
DestroyHandle: *0x101732850->0x1114fffe0
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111534d70

Walking through the new log I now see this handle gets promoted in BEGINGC 2, line 16916: GCHeap::Promote: Promote GC Root *0x101732848 = 0x111534d70 MT = 0x100875cd0T It is also relocated: GC Root 0x101732848 RELOCATED 0x111534d70 -> 0x111509720 MT = 0x100875cd0T

And taking the handle address after the GC2 is correctly updated (lines 17281):

========== ENDGC 2 (gen = 0, collect_classes = 0) ===========}
IndexItemCharacter 4: : True
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111509720
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111509720
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111509720
GetOrCreateChildNode 1: : True
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111509720
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111509720
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111509720

In the BEGINGC 3 block (line 182811): GCHeap::Promote: Promote GC Root *0x101732848 = 0x111509720 MT = 0x100875cd0T and relocated (line 184051): GC Root 0x101732848 RELOCATED 0x111509720 -> 0x1114fa570 MT = 0x100875cd0T

One thing that I'm noticing is that by the end of the 3rd GC, even now with a fflush(stdout) at the end of GC, our tracing of True/False seems to report before the GetThreadStaticStorageForModule calls.

========== ENDGC 3 (gen = 1, collect_classes = 0) ===========}
DestroyHandle: *0x101732b68->0x0
DestroyHandle: *0x101732b78->0x0
PEM.InsertEntry 1: : True
DestroyHandle: *0x101732b80->0x0
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
DestroyHandle: *0x101732b88->0x0
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x1114fa570
DestroyHandle: *0x101732b90->0x0
DestroyHandle: *0x101732ba0->0x0
DestroyHandle: *0x101732ba8->0x0
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
DestroyHandle: *0x101732bb0->0x0
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x1114fa570
DestroyHandle: *0x101732bb8->0x0
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x1114fa570
DestroyHandle: *0x101732bc0->0x0
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x1114fa570
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x1114fa570
PEM.InsertEntry 7: : False
DestroyHandle: *0x101732bf8->0x0
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
DestroyHandle: *0x101732b70->0x0
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x1114fa570
DestroyHandle: *0x101732db8->0x0

Could the presence of our console.writeline call AFTER the GC indicate that a 'buffered' (and by this time, old) value of the thread static storage is present in the stack somehow, and that it's this value that doesn't get updated, leading to the error?

The PEM.AddNodeItemEntry 4: : True calls are just stuff like

Console.WriteLine("PEM.AddNodeItemEntry 4: : " + (Thread.CurrentThread.ManagedThreadId == Environment.ManagedThreadId));
Console.Out.Flush();

christianscheuer commented 6 years ago

My last idea there can be safely disregarded. I added a breakpoint for this build at WKS::GCHeap::GarbageCollectGeneration and the 3rd GC's stacktrace shows it's simply triggering GC during the flush:

* frame #0: 0x000000010005eb80 sfbackend`WKS::GCHeap::GarbageCollectGeneration(this=0x0000000000000000, gen=0, reason=reason_alloc_soh) at gc.cpp:35071
    frame #1: 0x00000001000647d0 sfbackend`WKS::gc_heap::try_allocate_more_space(acontext=0x0000000102001050, size=152, gen_number=0) at gc.cpp:13074
    frame #2: 0x0000000100064973 sfbackend`WKS::gc_heap::allocate_more_space(acontext=0x0000000102001050, size=152, alloc_generation_number=0) at gc.cpp:13370
    frame #3: 0x00000001000e6ef4 sfbackend`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) at gc.cpp:13401
    frame #4: 0x00000001000e6dc0 sfbackend`WKS::GCHeap::Alloc(this=0x0000000102000140, context=0x0000000102001050, size=152, flags=0) at gc.cpp:34542
    frame #5: 0x00000001000167b3 sfbackend`::RhpGcAlloc(pEEType=0x00000001007fd968, uFlags=0, cbSize=152, pTransitionFrame=0x00007fff5fbfdd20) at gcrhenv.cpp:273
    frame #6: 0x0000000100111f83 sfbackend`RhpNewArrayRare at AllocFast.S:308
    frame #7: 0x00000001001ad77d sfbackend`String__FastAllocateString + 69
    frame #8: 0x00000001001b7c88 sfbackend`String__Concat_5 + 232
    frame #9: 0x00000001003205f1 sfbackend`S_P_CoreLib_System_Text_UTF8Encoding__GetBytes_3 + 5065
    frame #10: 0x0000000100307578 sfbackend`S_P_CoreLib_System_Text_EncoderNLS__GetBytes_0 + 404
    frame #11: 0x00000001003073c3 sfbackend`S_P_CoreLib_System_Text_EncoderNLS__GetBytes + 863
    frame #12: 0x00000001002afe44 sfbackend`S_P_CoreLib_System_IO_StreamWriter__Flush_0 + 468
    frame #13: 0x00000001002afc68 sfbackend`S_P_CoreLib_System_IO_StreamWriter__Flush + 44
    frame #14: 0x000000010017eac8 sfbackend`System_Console_System_IO_SyncTextWriter__Flush + 88

christianscheuer commented 6 years ago

@jkotas Ugh I'm sorry for not being more helpful.

*00000218D2746B00*, pMT=00007FF62C0E6491, newly marked: 1 // pMT is __Array<Object>::'vftable'

Printing the pMT works fine with your code - but In my lldb session, how would I resolve the pMT to __Array

dotnet / corert

Odd threading behavior - Thread.CurrentThread.ManagedThreadId != Environment.ManagedThreadId #6041