dotnet / corert

This repo contains CoreRT, an experimental .NET Core runtime optimized for AOT (ahead of time compilation) scenarios, with the accompanying compiler toolchain.
http://dot.net
MIT License
2.91k stars 508 forks source link

Odd threading behavior - Thread.CurrentThread.ManagedThreadId != Environment.ManagedThreadId #6041

Closed christianscheuer closed 6 years ago

christianscheuer commented 6 years ago

I'm getting a The write lock is being released without being held exception when exiting write mode on a ReaderWriterLockSlim on macOS.

Debugging this issue has led me to conclude, that apparently my call toExitWriteLock happens on a different thread than where I started, albeit in a weird way.

When I debug it, Thread.CurrentThread.ManagedThreadId will indicate I'm still on thread 1 where I acquired the lock, but Environment.ManagedThreadId which is what ReaderWriterLockSlim uses, indicates I'm on thread 6. This does not seem to happen on CoreCLR.

The code paths involved do not include any calls to Thread.Start or asynchronous Task code. It really all should be happening on the same thread (which Thread.CurrentThread.ManagedThreadId also seems to indicate is true). The rest of the app does have asynchronous Task code though.

Unfortunately, I haven't yet been able to make a repro case. It does seem to make a difference though if GC is invoked between the calls to EnterWriteLock and ExitWriteLock. Any ideas as how to better debug this?

This is basically what I'm doing, but I don't think this helps much.

using System;
using System.Collections.Generic;
using System.Threading;

namespace Test
{

    public struct WriteLock : IDisposable
    {
        private readonly LockManager lockManager;

        internal WriteLock(LockManager lockManager)
        {
            this.lockManager = lockManager;
        }

        public void Dispose()
        {
            this.lockManager.ReleaseWriteLock();
        }
    }

    public class LockManager : IDisposable
    {
        private ReaderWriterLockSlim readerWriterLock;

        public LockManager()
        {
            this.readerWriterLock = new ReaderWriterLockSlim(LockRecursionPolicy.NoRecursion);
        }

        public IDisposable AcquireWriteLock()
        {
            // Enter the lock
            if (!this.readerWriterLock.TryEnterWriteLock(-1))
            {
                throw new Exception("Could not enter write lock");
            }
            Console.WriteLine("Entered write lock: " + Thread.CurrentThread.ManagedThreadId);

            // Return a new instance of a write lock
            return new WriteLock(this);
        }

        public void Dispose()
        {
            this.Dispose(true);
            GC.SuppressFinalize(this);
        }

        internal void ReleaseReadLock()
        {
            this.readerWriterLock.ExitReadLock();
        }

        internal void ReleaseWriteLock()
        {
            this.readerWriterLock.ExitWriteLock();
        }

        protected virtual void Dispose(bool disposing)
        {
            Console.WriteLine("Disposing lock manager... disposing: " + disposing);
            if (disposing)
            {
                if (this.readerWriterLock != null)
                {
                    this.readerWriterLock.Dispose();
                    this.readerWriterLock = null;
                }
            }
        }   

        public static void Main(string[] args)
        {
            using (var lockMan = new LockManager())
            using (var locky = lockMan.AcquireWriteLock())
            {
                //Lots of stuff happening in here. No Task awaits though...
            }
        }
    }

}
christianscheuer commented 6 years ago

I added manual tracing code to my code equal to the following:

Console.WriteLine("Step 1: " + (Thread.CurrentThread.ManagedThreadId == Environment.CurrentManagedThreadId));

and was able to verify that this changes from True to False. This happens consistently in my code on every run, but at different places. In one instance it changes after a new Dictionary<...>, which leads me to suspect GC is involved. In that repro, no other threading or task related code is run.

It consistently works on CoreCLR (stays True).

christianscheuer commented 6 years ago

I added a while(true) in my Main so that I could sample the process in Activity Monitor. This is how it looks after the Environment.CurrentManagedThreadId change. It looks like my Main method (after returning from the code that because of GC or something else makes the Environment.CurrentManagedThreadId change) is still actually running on the main thread, indicating that Thread.CurrentThread.ManagedThreadId is right by saying it's still 1, whereas Environment.CurrentManagedThreadId is wrong (now saying 2 not 6 as mentioned in the OP because I simplified the project to make a cleaner repro).

Call graph:
    2465 Thread_1605348   DispatchQueue_1: com.apple.main-thread  (serial)
    + 2465 start  (in libdyld.dylib) + 1  [0x7fffcd7c0235]
    +   2465 main  (in sfbackend) + 169  [0x100071c69]  main.cpp:376
    +     2465 __managed__Main  (in sfbackend) + 268  [0x1009faaa0]
    +       2465 sfbackend_SoundFlow_Shortcuts_Program__Main  (in sfbackend) + 404  [0x10060b8f8]
    2465 Thread_1605351
    + 2465 start_wqthread  (in libsystem_pthread.dylib) + 13  [0x7fffcd9d907d]
    +   2465 _pthread_wqthread  (in libsystem_pthread.dylib) + 1023  [0x7fffcd9d948e]
    +     2465 __workq_kernreturn  (in libsystem_kernel.dylib) + 10  [0x7fffcd8ef44e]
    2465 Thread_1605355
      2465 thread_start  (in libsystem_pthread.dylib) + 13  [0x7fffcd9d908d]
        2465 _pthread_start  (in libsystem_pthread.dylib) + 286  [0x7fffcd9d9887]
          2465 _pthread_body  (in libsystem_pthread.dylib) + 180  [0x7fffcd9d993b]
            2465 FinalizerStart(void*)  (in sfbackend) + 69  [0x100073885]  FinalizerHelpers.cpp:65
              2465 ProcessFinalizers  (in sfbackend) + 42  [0x100739f7e]
                2465 RhpWaitForFinalizerRequest  (in sfbackend) + 55  [0x1000739c7]  PalRedhawkFunctions.h:172
                  2465 UnixEvent::Wait(unsigned int)  (in sfbackend) + 331  [0x1000bd74b]  PalRedhawkUnix.cpp:329
                    2465 _pthread_cond_wait  (in libsystem_pthread.dylib) + 712  [0x7fffcd9da7fa]
                      2465 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fffcd8eebf2]
jkotas commented 6 years ago

Environment.CurrentManagedThreadId is simple method - it just calls ManagedThreadId.GetCurrentThreadId() that just reads value from static. It compiles into a series of pointer dereferences. You can try single stepping through it in disassembly when it is returning good value, and then single stepping through again when it is returning wrong value. Comparing the two traces may give you clues what's wrong.

It is likely a problem with GC reporting again. Same as https://github.com/dotnet/corert/issues/5922#issuecomment-396008232 . It may be even the same underlying problem.

christianscheuer commented 6 years ago

Thanks, @jkotas .

Stepping through the good and the bad case, it seems to take the same code paths. (I'm not hitting the Environment.CurrentManagedThreadId the first time it's changed, so not sure how much it's gonna help). But the following thread static call in ManagedThreadId simply returns 2 in the bad case , 1 in the good case (after derefencing eax+0x10, ie. getting access to the ): __GetThreadStaticBase_S_P_CoreLib_System_Threading_ManagedThreadId

I'm thinking that maybe the finalizer is called on ManagedThreadId so the Id gets recycled.. How can I add a breakpoint to RecycleId or the finalizer? If I do breakpoint set -r ManagedThreadId I don't seem to catch the finalizer nor the RecycleId (I'm guessing it's inlined). This is with false in the csproj.

sfbackend`S_P_CoreLib_System_Threading_ManagedThreadId__get_Current:
    0x1001698d8 <+0>:  pushq  %rbp
    0x1001698d9 <+1>:  subq   $0x10, %rsp
    0x1001698dd <+5>:  leaq   0x10(%rsp), %rbp
    0x1001698e2 <+10>: xorl   %eax, %eax
    0x1001698e4 <+12>: movl   %eax, -0x4(%rbp)
    0x1001698e7 <+15>: callq  0x10000a3e9               ; __GetThreadStaticBase_S_P_CoreLib_System_Threading_ManagedThreadId
    0x1001698ec <+20>: movl   0x10(%rax), %eax
->  0x1001698ef <+23>: movl   %eax, -0x4(%rbp)
    0x1001698f2 <+26>: cmpl   $0x0, -0x4(%rbp)
    0x1001698f6 <+30>: jne    0x100169904               ; <+44>
    0x1001698f8 <+32>: callq  0x100169930               ; S_P_CoreLib_System_Threading_ManagedThreadId__MakeForCurrentThread
    0x1001698fd <+37>: nop    
    0x1001698fe <+38>: leaq   (%rbp), %rsp
    0x100169902 <+42>: popq   %rbp
    0x100169903 <+43>: retq   
    0x100169904 <+44>: movl   -0x4(%rbp), %eax
    0x100169907 <+47>: leaq   (%rbp), %rsp
    0x10016990b <+51>: popq   %rbp
    0x10016990c <+52>: retq   
    0x10016990d <+53>: nop    
    0x10016990e <+54>: nop    
    0x10016990f <+55>: nop 
christianscheuer commented 6 years ago

I was able to set a breakpoint on the ManagedThreadId constructor (not sure why many of these SPCorelib function names are not found in lldb, had to use external disassembler).

So we can see MakeForCurrentThread gets called again, which leads to the wrong id assignment. Debugging it, I found out that the call to: __GetThreadStaticBase_S_P_CoreLib_System_Threading_ManagedThreadId

returns a new address for the Thread Static Base when the error is occuring. The thread static base + 0x10 (I'm guessing this is t_currentManagedThreadId) is 0 in the new thread static base, which then leeds to the call to MakeForCurrentThread.

Not sure how internals of tls are implemented in CoreRT. Please let me know if I can help further.

This is my backtrace when the ManagedThreadId gets recreated:

thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 5.1
  * frame #0: 0x00000001001699ac sfbackend`S_P_CoreLib_System_Threading_ManagedThreadId___ctor
    frame #1: 0x0000000100169952 sfbackend`S_P_CoreLib_System_Threading_ManagedThreadId__MakeForCurrentThread + 34
    frame #2: 0x00000001001698fd sfbackend`S_P_CoreLib_System_Threading_ManagedThreadId__get_Current + 37
    frame #3: 0x00000001001f6d2d sfbackend`S_P_CoreLib_Internal_Runtime_Augments_EnvironmentAugments__get_CurrentManagedThreadId + 9
    frame #4: 0x000000010008ee0d sfbackend`System_Runtime_Extensions_System_Environment__get_CurrentManagedThreadId + 9
christianscheuer commented 6 years ago

I tested with a debug build of CoreRT (same issue as in release build) and then tried calling RhpEnableConservativeStackReporting() where CPPCODEGEN would've done it. That unfortunately does not fix the issue, gives exact same behavior. So I don't know if that rules out GC as a root cause, or if even conservative stack reporting can have an influence on the thread static base being wrong.

christianscheuer commented 6 years ago

In the GetThreadStaticBaseForType helper, I found out that the code always takes the path as shown (eg. the early return storage[typeTlsIndex]), so the issue does not come from the TSS being re allocated in this method.

The RhPrintInt is just a helper method I put in to be able to log stuff to the console.

            int moduleIndex = pModuleData->ModuleIndex;
            object[] storage = (object[])RuntimeImports.RhGetThreadStaticStorageForModule(moduleIndex);

            // Check whether thread static storage has already been allocated for this module and type.
            if ((storage != null) && (typeTlsIndex < storage.Length) && (storage[typeTlsIndex] != null))
            {
                if (typeTlsIndex == 7) //managedthreadid
                {
                    object st = storage[typeTlsIndex];
                    RuntimeImports.RhPrintInt(typeTlsIndex);
                }

                return storage[typeTlsIndex];
            }
christianscheuer commented 6 years ago

Was able to set a watchpoint on the address in memory where the original value of t_currentManagedThreadId was stored. Since that is stored in what looks like a managed object (the object[] ), I think it's likely that this object gets moved during GC but something goes wrong in that process - since subsequent reads of the t_currentManagedThreadId member in ManagedThreadId's ThreadStaticBase returns 0. Really out of my comfort zone here though. Here's the stacktrace from the watchpoint write.

thread #1, queue = 'com.apple.main-thread', stop reason = watchpoint 1
  * frame #0: 0x000000010003668c sfbackend`WKS::memcopy(dmem="\x04", smem="\x04", size=96) at gc.cpp:1846
    frame #1: 0x00000001000c81fb sfbackend`WKS::gc_heap::compact_plug(unsigned char*, unsigned long, unsigned int, WKS::gc_heap::compact_args*) [inlined] WKS::gc_heap::gcmemcopy(dest="\x10\x94\x87", src="\x10\x94\x87", len=128, copy_cards_p=YES) at gc.cpp:24464
    frame #2: 0x00000001000c8056 sfbackend`WKS::gc_heap::compact_plug(plug="\x10\x94\x87", size=128, check_last_object_p=NO, args=0x00007fff5fbfbf50) at gc.cpp:24561
    frame #3: 0x00000001000c9b67 sfbackend`WKS::gc_heap::compact_in_brick(tree="\x90Ԁ", args=0x00007fff5fbfbf50) at gc.cpp:24691
    frame #4: 0x00000001000c981c sfbackend`WKS::gc_heap::compact_in_brick(tree="@/\x80", args=0x00007fff5fbfbf50) at gc.cpp:24662
    frame #5: 0x00000001000c9e27 sfbackend`WKS::gc_heap::compact_in_brick(tre    frame #5: 0x00000001000c9e27 sfbackend`WKS::gc_heap::compact_in_brick(tre    frame #5: 0x00000001000c9e27 sfbackend`WKS::gc_heap::compact_in_brick(tre    frame #5: 0x00000001000c9e27 sfbackend`WKS::gc_heap::compact_in_brick(tree="\x10\x94\x87", args=0x00007fff5fbfbf50) at gc.cpp:24706
    frame #6: 0x00000001000c9e27 sfbackend`WKS::gc_heap::compact_in_brick(tree="?\x9c\x87", args=0x00007fff5fbfbf50) at gc.cpp:24706
    frame #7: 0x00000001000c981c sfbackend`WKS::gc_heap::compact_in_brick(tree="\x10\x94\x87", args=0x00007fff5fbfbf50) at gc.cpp:24662
    frame #8: 0x00000001000b9131 sfbackend`WKS::gc_heap::compact_phase(condemned_gen_number=1, first_condemned_address="\x90b\x8a", clear_cards=NO) at gc.cpp:24848
    frame #9: 0x000000010008174f sfbackend`WKS::gc_heap::plan_phase(condemned_gen_number=1) at gc.cpp:22631
    frame #10: 0x000000010006fbe5 sfbackend`WKS::gc_heap::gc1() at gc.cpp:15273
    frame #10: 0x000000010006fbe5 sfbackend`WKS::gc_heap::gc1() at gc.cpp:15273
    frame #11: 0x000000010008e3bf sfbackend`WKS::gc_heap::garbage_collect(n=0) at gc.cpp:16861
    frame #12: 0x000000010005ff40 sfbackend`WKS::GCHeap::GarbageCollectGeneration(this=0x0000000101704440, gen=0, reason=reason_alloc_soh) at gc.cpp:35196
    frame #13: 0x0000000100065380 sfbackend`WKS::gc_heap::try_allocate_more_space(acontext=0x0000000101707150, size=24, gen_number=0) at gc.cpp:13074
    frame #14: 0x0000000100065543 sfbackend`WKS::gc_heap::allocate_more_space(acontext=0x0000000101707150, size=24, alloc_generation_number=0) at gc.cpp:13370
    frame #15: 0x00000001000e8434 sfbackend`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) at gc.cpp:13401
    frame #16: 0x00000001000e8300 sfbackend`WKS::GCHeap::Alloc(this=0x0000000101704440, context=0x0000000101707150, size=24, flags=0) at gc.cpp:34542
    frame #17: 0x0000000100016be3 sfbackend`::RhpGcAlloc(pEEType=0x0000000100807f30, uFlags=0, cbSize=24, pTransitionFrame=0x00007fff5fbfe180) at gcrhenv.cpp:273
    frame #18: 0x00000001001134dd sfbackend`RhpNewObject at AllocFast.S:85
    frame #19: 0x00000001003ed68f sfbackend`S_P_CoreLib_System_Span_1<Char>___ctor + 91

This is with a call to RhpEnableConservativeStackReporting(); in static int InitializeRuntime in main.cpp.

jkotas commented 6 years ago

I think you are getting closer to find the root cause of the problem!

The storage for thread statics is created by Thread::SetThreadStaticStorageForModule. This method allocates the GC handle using RhpHandleAlloc that is supposed to keep the Object[] alive and that is supposed to be update by the GC when the Object[] moves.

It looks like that there is something wrong with this handle. Could you please check whether it got updated by the GC when the Object[] moved?

One possible explanation of the problem is that the handle is freed prematurely somehow. You can try adding tracing to Thread::SetThreadStaticStorageForModule and to RhHandleFree to check for this possibility.

christianscheuer commented 6 years ago

@jkotas thanks, great that I'm not completely off path at least :) I got to the same conclusion. It definitely seems like the RhpHandleAlloc call has no effect. I tried commenting out the RhHandleFree calls, and tried changing the handle type being allocated to a pinned reference. I even tried pinning it via a second call in the ThreadStatics.cs. I also tried calling RhHandleGet in favor of the Object** dereference, but all turned out the same way. Any further ideas? I'll definitely try to add some tracing as well in the C++ code instead of having to do it all in lldb, might reveal more info quicker that way.

christianscheuer commented 6 years ago

For the handle being updated, I am pretty sure it gets updated, since one of my initial findings was that the address had changed. But maybe the handle is updated to point to a wrong place. Should I add tracing around the place in gc/objecthandle where the handle is supposed to be updated to point to the moved object? Also I was wondering, is it possible that it's the objects being pointed to by the object[] that get wrongly moved / not updated?

jkotas commented 6 years ago

It is possible.

All object reference updates when the GC moves things around go through GCHeap::Relocate so you can try adding a logging there.

It would be best to have the stresslog for this that I have mentioned on the other issue. All interesting places in GC where the objects are scanned and updated are instrumented using stresslog. GCHeap::Relocate is instrumented using it as well. If you prefer debugging using printf, you may be able replace the STRESS_LOG macros with printfs in your local build.

christianscheuer commented 6 years ago

Wonderful, thanks @jkotas. I wasn't against using the stress log, just never got around to wrapping my head around it. The instructions for the stress log mention sos on Windows. How would I go about dumping it from lldb on macOS? Or would you say it would be easier to simply alter the macros to printf?

christianscheuer commented 6 years ago

Never mind, printf was easy! I'll attach a log as soon as possible.

jkotas commented 6 years ago

dumping it from lldb on macOS

lldb that comes with XCode cannot load the sos plugin. It is necessary to build lldb from sources: https://github.com/dotnet/coreclr/blob/master/Documentation/building/debugging-instructions.md#debugging-coreclr-on-os-x

Never mind, printf was easy! I'll

👍

christianscheuer commented 6 years ago

I switched out some STRESSLOG macros with printf, added some instrumentation here and there switched on conservative stack reporting and added a few pinning handles that never get freed as part of my tests. See complete delta here: https://github.com/christianscheuer/corert/tree/stresslog

The repro code itself is a recursive text analyzing algorithm. It is instrumented with calls that check if Console.WriteLine("PEM.AddNodeItemEntry 1: " + (Thread.CurrentThread.ManagedThreadId == Environment.CurrentManagedThreadId)); The first static part is just an identifier that tracks in which method the trace is from. So the log contains a lot of "...: True" until it suddenly becomes "...: False". That is where it's interesting.

Furthermore I have added some tracing to S/GetThreadStaticStorageForModule so we can track the handle. The last good state is right before a generation 2 GC at line 834681 in the log.

GetThreadStaticStorageForModule 421363 threadStaticsStorageHandle = 0x102c32848
GetThreadStaticStorageForModule 421363 *threadStaticsStorageHandle = 0x10f01e1f0

the 421363 is the native thread id.

Under the GcScanHandles (Promotion Phase = 1) phase the following is printed:

    GCHeap::Promote: Promote GC Root *0x102c32848 = 0x10f01e1f0 MT = 0x100876f00T

so seems like the handle does get promoted (it also seems to be present in earlier GC scans).

But then a little later in line 835765 it seems like my code is still running while the GC is running (root issue maybe), and that it transitions from True to False within that (ie. maybe the thread is not correctly hijacked/paused). Not sure if there's some contention about getting access to the console out and if that plays in.

 GCHeap::Promote: Promote GC Root *0x102c328f0 = 0x10efc8908 MT = 0x10089b720T
    GCHeap::Promote: Promote GC Root *0x102c328f8 = 0x10efc88e8 MT = 0x10089b720T
    GCHeap::Promote: Promote GC Root *0x102c32900 = 0x10eIndexItemCharacter 1: True
IndexItemCharacter 2: False
fc88d0 MT = 0x10089b698T
    GCHeap::Promote: Promote GC Root *0x102c32908 = 0x10efc88b8 MT = 0x10089b698T
    GCHeap::Promote: Promote GC Root *0x102c32910 = 0x10efc88a0 MT = 0x10089b698T
    GCHeap::Promote: Promote GC Root *0x102c32918 = 0x10efc8888 MT = 0x10089b698T
    GCHeap::Promote: Promote GC Root *0x102c32920 = 0x10efc8870 MT = 0x10089b698T

The end of this GC reads:

GC Heap 0x0
    Generation 2 [0x10efc3000, 0x0] cur = 0x0
        Segment mem 0x100720008 alloc = 0x100763f78 used 0x100763f78 committed 0x100763f78
        Segment mem 0x10efc3000 alloc = 0x10f177380 used 0x10fd3d5b0 committed 0x10fd44000
    Generation 1 [0x10f149108, 0x0] cur = 0x0
        Segment mem 0x10efc3000 alloc = 0x10f177380 used 0x10fd3d5b0 committed 0x10fd44000
    Generation 0 [0x10f177368, 0x0] cur = 0x0
        Segment mem 0x10efc3000 alloc = 0x10f177380 used 0x10fd3d5b0 committed 0x10fd44000
========== ENDGC 102 (gen = 2, collect_classes = 0) ===========}
GetThreadStaticStorageForModule 421363 threadStaticsStorageHandle = 0x102c32848
GetThreadStaticStorageForModule 421363 *threadStaticsStorageHandle = 0x10f01e1f0
GetThreadStaticStorageForModule 421363 threadStaticsStorageHandle = 0x102c32848
GetThreadStaticStorageForModule 421363 *threadStaticsStorageHandle = 0x10f01e1f0
GetThreadStaticStorageForModule 421363 threadStaticsStorageHandle = 0x102c32848
GetThreadStaticStorageForModule 421363 *threadStaticsStorageHandle = 0x10f01e1f0
GetThreadStaticStorageForModule 421363 threadStaticsStorageHandle = 0x102c32848
GetThreadStaticStorageForModule 421363 *threadStaticsStorageHandle = 0x10f01e1f0
GetThreadStaticStorageForModule 421363 threadStaticsStorageHAddNodeItem start: False
AddNodeItem 1: False
AddNodeItem 2: False
OnItemWordIndexed start: False
OnItemWordIndexed 2: False

What we can also see here is that the handle's address is not updated (!)

That handle was originally alloc'ed in line 4027, and no calls to RhHandleFree or DestroyHandle for that handle occur in the log. Initial alloc line 4026:

CreateHandle: 0x102c32848, type=2
SetThreadStaticStorageForModule 421363: calls RhpHandleAlloc. Handle = 0x102c32848, pStorage = 0x10f01e1f0
SetThreadStaticStorageForModule 421363: calls RhHandleFree on 0x102c32850

Here's a link to the complete, raw log https://drive.google.com/open?id=1rjjA0JHWWSs3J1fn4LQTnEWeveb0XtPe

Please let me know if I should make any adjustments to this in order to reveal more information.

jkotas commented 6 years ago

it seems like my code is still running while the GC is running

I think it is just a problem with output buffering. You can try to add fflush(stdout) at the end of the GC and see whether it goes away.

So far, the simplest explanation for the symptoms is that some part of the thread static object graph is not getting marked by the GC and the GC considers it garbage. Here are some ideas what to try:

*00000218D2746B00*, pMT=00007FF62C0E6491, newly marked: 1 // pMT is __Array<Object>::`vftable'
*00000218D2746B60*, pMT=00007FF62C0EEEC9, newly marked: 1 // pMT is _GCStaticEEType_01::`vftable'
*00000218D2746BA8*, pMT=00007FF62C09D051, newly marked: 1 // pMT is S_P_CoreLib_System_Threading_ManagedThreadId::`vftable':
GCHeap::Promote: Promote GC Root *00000218EABC2218 = 00000218D2746B00 MT = 00007FF62C0E6490T <- this is the threadStaticsStorageHandle GCHandle 

Thanks for helping us to trace this problem down!

christianscheuer commented 6 years ago

Thanks for keeping with me, @jkotas ! :)

problem with output buffering. You can try to add fflush(stdout)

You're right, it was just the output buffering. fflush helped confirm that. I introduced flushing after most of the logging statements now, makes many things clearer now.

Revert the change that adds RhpEnableConservativeStackReporting

Done

t is interesting that the offending GC is the first GC with "requested generation = 2".

I thought this too, but with the changes and new runs, it now happens first time in a gen = 1 GC.

export RH_HeapVerify=1

I'm not sure I'm seeing the effects of this. Tried both running this in Terminal before building ILC and before invoking dotnet publish -r osx-x64, doesn't seem to make much of a difference in the output I'm getting. I haven't gotten around to stepping through and adding the more low level tracing yet. But did another log with these new parameters. Log file here: https://drive.google.com/file/d/1ukeZfhQYdWRt4SkVTpbTebudziC_ZCDE/view?usp=sharing

What strikes me now is this:

First we get a handle at 0x101732850 pointing to 0x1114fffe0

CreateHandle: 0x101732850, type=2
SetThreadStaticStorageForModule 100384: calls RhpHandleAlloc. Handle = 0x101732850, pStorage = 0x1114fffe0

Then it gets destroyed and a new one is allocated 0x101732848->0x111534d70:

SetThreadStaticStorageForModule 100384: calls RhpHandleAlloc. Handle = 0x101732848, pStorage = 0x111534d70
SetThreadStaticStorageForModule 100384: calls RhHandleFree on 0x101732850
DestroyHandle: *0x101732850->0x1114fffe0
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111534d70

Walking through the new log I now see this handle gets promoted in BEGINGC 2, line 16916: GCHeap::Promote: Promote GC Root *0x101732848 = 0x111534d70 MT = 0x100875cd0T It is also relocated: GC Root 0x101732848 RELOCATED 0x111534d70 -> 0x111509720 MT = 0x100875cd0T

And taking the handle address after the GC2 is correctly updated (lines 17281):

========== ENDGC 2 (gen = 0, collect_classes = 0) ===========}
IndexItemCharacter 4: : True
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111509720
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111509720
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111509720
GetOrCreateChildNode 1: : True
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111509720
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111509720
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x111509720

In the BEGINGC 3 block (line 182811): GCHeap::Promote: Promote GC Root *0x101732848 = 0x111509720 MT = 0x100875cd0T and relocated (line 184051): GC Root 0x101732848 RELOCATED 0x111509720 -> 0x1114fa570 MT = 0x100875cd0T

One thing that I'm noticing is that by the end of the 3rd GC, even now with a fflush(stdout) at the end of GC, our tracing of True/False seems to report before the GetThreadStaticStorageForModule calls.

========== ENDGC 3 (gen = 1, collect_classes = 0) ===========}
DestroyHandle: *0x101732b68->0x0
DestroyHandle: *0x101732b78->0x0
PEM.InsertEntry 1: : True
DestroyHandle: *0x101732b80->0x0
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
DestroyHandle: *0x101732b88->0x0
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x1114fa570
DestroyHandle: *0x101732b90->0x0
DestroyHandle: *0x101732ba0->0x0
DestroyHandle: *0x101732ba8->0x0
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
DestroyHandle: *0x101732bb0->0x0
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x1114fa570
DestroyHandle: *0x101732bb8->0x0
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x1114fa570
DestroyHandle: *0x101732bc0->0x0
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x1114fa570
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x1114fa570
PEM.InsertEntry 7: : False
DestroyHandle: *0x101732bf8->0x0
GetThreadStaticStorageForModule 100384 threadStaticsStorageHandle = 0x101732848
DestroyHandle: *0x101732b70->0x0
GetThreadStaticStorageForModule 100384 *threadStaticsStorageHandle = 0x1114fa570
DestroyHandle: *0x101732db8->0x0

Could the presence of our console.writeline call AFTER the GC indicate that a 'buffered' (and by this time, old) value of the thread static storage is present in the stack somehow, and that it's this value that doesn't get updated, leading to the error?

The PEM.AddNodeItemEntry 4: : True calls are just stuff like

Console.WriteLine("PEM.AddNodeItemEntry 4: : " + (Thread.CurrentThread.ManagedThreadId == Environment.ManagedThreadId));
Console.Out.Flush();
christianscheuer commented 6 years ago

My last idea there can be safely disregarded. I added a breakpoint for this build at WKS::GCHeap::GarbageCollectGeneration and the 3rd GC's stacktrace shows it's simply triggering GC during the flush:

* frame #0: 0x000000010005eb80 sfbackend`WKS::GCHeap::GarbageCollectGeneration(this=0x0000000000000000, gen=0, reason=reason_alloc_soh) at gc.cpp:35071
    frame #1: 0x00000001000647d0 sfbackend`WKS::gc_heap::try_allocate_more_space(acontext=0x0000000102001050, size=152, gen_number=0) at gc.cpp:13074
    frame #2: 0x0000000100064973 sfbackend`WKS::gc_heap::allocate_more_space(acontext=0x0000000102001050, size=152, alloc_generation_number=0) at gc.cpp:13370
    frame #3: 0x00000001000e6ef4 sfbackend`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) at gc.cpp:13401
    frame #4: 0x00000001000e6dc0 sfbackend`WKS::GCHeap::Alloc(this=0x0000000102000140, context=0x0000000102001050, size=152, flags=0) at gc.cpp:34542
    frame #5: 0x00000001000167b3 sfbackend`::RhpGcAlloc(pEEType=0x00000001007fd968, uFlags=0, cbSize=152, pTransitionFrame=0x00007fff5fbfdd20) at gcrhenv.cpp:273
    frame #6: 0x0000000100111f83 sfbackend`RhpNewArrayRare at AllocFast.S:308
    frame #7: 0x00000001001ad77d sfbackend`String__FastAllocateString + 69
    frame #8: 0x00000001001b7c88 sfbackend`String__Concat_5 + 232
    frame #9: 0x00000001003205f1 sfbackend`S_P_CoreLib_System_Text_UTF8Encoding__GetBytes_3 + 5065
    frame #10: 0x0000000100307578 sfbackend`S_P_CoreLib_System_Text_EncoderNLS__GetBytes_0 + 404
    frame #11: 0x00000001003073c3 sfbackend`S_P_CoreLib_System_Text_EncoderNLS__GetBytes + 863
    frame #12: 0x00000001002afe44 sfbackend`S_P_CoreLib_System_IO_StreamWriter__Flush_0 + 468
    frame #13: 0x00000001002afc68 sfbackend`S_P_CoreLib_System_IO_StreamWriter__Flush + 44
    frame #14: 0x000000010017eac8 sfbackend`System_Console_System_IO_SyncTextWriter__Flush + 88
christianscheuer commented 6 years ago

@jkotas Ugh I'm sorry for not being more helpful.

*00000218D2746B00*, pMT=00007FF62C0E6491, newly marked: 1 // pMT is __Array<Object>::'vftable'

Printing the pMT works fine with your code - but In my lldb session, how would I resolve the pMT to __Array::'vftable' ?

Maybe I'm missing some symbols or lacking some basic knowledge of lldb debugging.

jkotas commented 6 years ago

Try image lookup --address 0x00007FF62C0E6491

christianscheuer commented 6 years ago

Cool. So here are some results based on that.

1st GC:

*0x10300e000*, pMT=0x100875cd1, newly marked: 1  //pMT is vtable for __Array<Object> + 1
*0x1030206d0*, pMT=0x10089a4d9, newly marked: 1  //pMT is __GCStaticEEType_001 + 1
*0x103020720*, pMT=0x10081ea19, newly marked: 1  //pMT is vtable for S_P_TypeLoader_System_Collections_Generic_LowLevelDictionary_2<S_P_CoreLib_System_Runtime_TypeManagerHandle__S_P_TypeLoader_Internal_NativeFormat_NativeReader> + 1  
*0x103020740*, pMT=0x10087cb49, newly marked: 1  //pMT is vtable for __Array<S_P_TypeLoader_System_Collections_Generic_LowLevelDictionary_2_Entry<S_P_CoreLib_System_Runtime_TypeManagerHandle__S_P_TypeLoader_Internal_NativeFormat_NativeReader>> + 1
*0x103020830*, pMT=0x100833979, newly marked: 1  //pMT is S_P_TypeLoader_System_Collections_Generic_LowLevelDictionary_2_Entry<S_P_CoreLib_System_Runtime_TypeManagerHandle__S_P_TypeLoader_Internal_NativeFormat_NativeReader> + 1
*0x1030207f8*, pMT=0x10080af69, newly marked: 0  //pMT is vtable for S_P_TypeLoader_Internal_NativeFormat_NativeReader + 1
*0x103007f38*, pMT=0x10089a479, newly marked: 1  //pMT is __GCStaticEEType_01 + 1
*0x103007f80*, pMT=0x10080d369, newly marked: 1  //pMT is vtable for S_P_CoreLib_System_Threading_ManagedThreadId + 1
*0x10300fd28*, pMT=0x10089a501, newly marked: 1  //pMT is __GCStaticEEType_011 + 1
*0x10300fd78*, pMT=0x10089a479, newly marked: 1  //pMT is __GCStaticEEType_01 + 1
*0x10300fd90*, pMT=0x100813161, newly marked: 1  //pMT is vtable for S_P_CoreLib_System_Text_StringBuilder + 1
*0x103019d00*, pMT=0x1008772a1, newly marked: 1  //pMT is vtable for __Array<Char> + 1
*0x10300e0c8*, pMT=0x10089a441, newly marked: 1  //pMT is __GCStaticEEType_0 + 1

Between 1st and 2nd GC, the handle gets destroyed and recreated:

SetThreadStaticStorageForModule 142287: calls RhpHandleAlloc. Handle = 0x1015a2850, pStorage = 0x10302a008
SetThreadStaticStorageForModule 142287: calls RhHandleFree on 0x1015a2858
DestroyHandle: *0x1015a2858->0x10300e000
GetThreadStaticStorageForModule 142287 threadStaticsStorageHandle = 0x1015a2850
GetThreadStaticStorageForModule 142287 *threadStaticsStorageHandle = 0x10302a008

And recreated again:

CreateHandle: 0x1015a2848, type=2
SetThreadStaticStorageForModule 142287: calls RhpHandleAlloc. Handle = 0x1015a2848, pStorage = 0x10305ed98
SetThreadStaticStorageForModule 142287: calls RhHandleFree on 0x1015a2850
DestroyHandle: *0x1015a2850->0x10302a008
GetThreadStaticStorageForModule 142287 threadStaticsStorageHandle = 0x1015a2848
GetThreadStaticStorageForModule 142287 *threadStaticsStorageHandle = 0x10305ed98
Growing heap_segment: Ix high address: Ix
New commit: IxGetThreadStaticStorageForModule 142287 threadStaticsStorageHandle = 0x1015a2848
GetThreadStaticStorageForModule 142287 *threadStaticsStorageHandle = 0x10305ed98

2nd GC:

*0x10305ed98*, pMT=0x100875cd1, newly marked: 1  //pMT is vtable for __Array<Object> + 1
*0x103056348*, pMT=0x10089a479, newly marked: 1  //pMT is __GCStaticEEType_01 + 1
*0x103056360*, pMT=0x1007f9501, newly marked: 0  //pMT is vtable for System_Threading_Thread_System_Threading_Thread + 1
*0x1030507a0*, pMT=0x10089a479, newly marked: 1  //pMT is __GCStaticEEType_01 + 1
*0x103053550*, pMT=0x10089a441, newly marked: 1  //pMT is __GCStaticEEType_0 + 1
*0x103044698*, pMT=0x10089a489, newly marked: 1  //pMT is __GCStaticEEType_000 + 1
*0x10302a0d8*, pMT=0x10089a479, newly marked: 1  //pMT is __GCStaticEEType_01 + 1
*0x10302a118*, pMT=0x100819909, newly marked: 0  //pMT is vtable for S_P_CoreLib_Internal_Runtime_Augments_RuntimeThread + 1
*0x1030611a0*, pMT=0x10089a479, newly marked: 1  //pMT is __GCStaticEEType_01 + 1
*0x1030622b8*, pMT=0x1008802c1, newly marked: 1  //pMT is vtable for __Array<__Array<UInt8>> + 1
*0x10312bae8*, pMT=0x100877161, newly marked: 1  //pMT is vtable for __Array<UInt8> + 1
*0x103062088*, pMT=0x100877161, newly marked: 1  //pMT is vtable for __Array<UInt8> + 1
*0x103061338*, pMT=0x100877161, newly marked: 1  //pMT is vtable for __Array<UInt8> + 1
*0x10305ee78*, pMT=0x10089a479, newly marked: 1  //pMT is __GCStaticEEType_01 + 1
*0x103126148*, pMT=0x100880381, newly marked: 1  //pMT is vtable for __Array<__Array<Char>> + 1
*0x10305f010*, pMT=0x1008772a1, newly marked: 1  //pMT is vtable for __Array<Char> + 1
    GCHeap::Promote: Promote GC Root *0x1015a2848 = 0x10305ed98 MT = 0x100875cd0T

So it looks like, even though our problem first comes in 3rd GC (because that's when it gets relocated), already here the ManagedThreadId does not get marked, if I'm reading this correctly.

jkotas commented 6 years ago

already here the ManagedThreadId does not get marked,

I think that is because of the 2nd GC is gen0 GC. ManagedThreadId was promoted from gen0 to gen1 during the 1st GC. The 2nd GC is gen0 GC and so it does not scan it. The 3rd GC is gen1 GC - does it scan it?

christianscheuer commented 6 years ago

Oops. Sent out the previous too soon. Here it is with all 3 GC's... Looks like it sets newly marked = 0 for the 3rd GC for ManagedThreadId.

CreateHandle: 0x1015a2860, type=2
SetThreadStaticStorageForModule 177574: calls RhpHandleAlloc. Handle = 0x1015a2860, pStorage = 0x102007ee0
AllocateThreadStaticStorageForType: typeTlsIndex=0x7MT=0x10089a478
ManagedThreadId.MakeForCurrentThread called
GetThreadStaticStorageForModule 177574 threadStaticsStorageHandle = 0x1015a2860
GetThreadStaticStorageForModule 177574 *threadStaticsStorageHandle = 0x102007ee0

CreateHandle: 0x1015a2858, type=2
SetThreadStaticStorageForModule 177574: calls RhpHandleAlloc. Handle = 0x1015a2858, pStorage = 0x10200e000
SetThreadStaticStorageForModule 177574: calls RhHandleFree on 0x1015a2860
DestroyHandle: *0x1015a2860->0x102007ee0
AllocateThreadStaticStorageForType: typeTlsIndex=0x15MT=0x10089a440
GetThreadStaticStorageForModule 177574 threadStaticsStorageHandle = 0x1015a2858
GetThreadStaticStorageForModule 177574 *threadStaticsStorageHandle = 0x10200e000

==

GC1:

*0x10200e000*, pMT=0x100875cd1, newly marked: 1  //vtable for __Array<Object> + 1
*0x1020206d0*, pMT=0x10089a4d9, newly marked: 1  // __GCStaticEEType_001 + 1
*0x102020720*, pMT=0x10081ea19, newly marked: 1  //vtable for S_P_TypeLoader_System_Collections_Generic_LowLevelDictionary_2<S_P_CoreLib_System_Runtime_TypeManagerHandle__S_P_TypeLoader_Internal_NativeFormat_NativeReader> + 1
*0x102020740*, pMT=0x10087cb49, newly marked: 1
*0x102020830*, pMT=0x100833979, newly marked: 1
*0x1020207f8*, pMT=0x10080af69, newly marked: 0
*0x102007f38*, pMT=0x10089a479, newly marked: 1  //__GCStaticEEType_01 + 1
*0x102007f80*, pMT=0x10080d369, newly marked: 1  //vtable for S_P_CoreLib_System_Threading_ManagedThreadId + 1
*0x10200fd28*, pMT=0x10089a501, newly marked: 1
*0x10200fd78*, pMT=0x10089a479, newly marked: 1
*0x10200fd90*, pMT=0x100813161, newly marked: 1
*0x102019d00*, pMT=0x1008772a1, newly marked: 1
*0x10200e0c8*, pMT=0x10089a441, newly marked: 1
    GCHeap::Promote: Promote GC Root *0x1015a2858 = 0x10200e000 MT = 0x100875cd0T

==

CreateHandle: 0x1015a2850, type=2
SetThreadStaticStorageForModule 177574: calls RhpHandleAlloc. Handle = 0x1015a2850, pStorage = 0x10202a008
SetThreadStaticStorageForModule 177574: calls RhHandleFree on 0x1015a2858
DestroyHandle: *0x1015a2858->0x10200e000
AllocateThreadStaticStorageForType: typeTlsIndex=0x16MT=0x10089a478
GetThreadStaticStorageForModule 177574 threadStaticsStorageHandle = 0x1015a2850
GetThreadStaticStorageForModule 177574 *threadStaticsStorageHandle = 0x10202a008

CreateHandle: 0x1015a2848, type=2
SetThreadStaticStorageForModule 177574: calls RhpHandleAlloc. Handle = 0x1015a2848, pStorage = 0x10205ed98
SetThreadStaticStorageForModule 177574: calls RhHandleFree on 0x1015a2850
DestroyHandle: *0x1015a2850->0x10202a008
AllocateThreadStaticStorageForType: typeTlsIndex=0x18MT=0x10089a478
GetThreadStaticStorageForModule 177574 threadStaticsStorageHandle = 0x1015a2848
GetThreadStaticStorageForModule 177574 *threadStaticsStorageHandle = 0x10205ed98

==

GC2:

GcScanHandles (Promotion Phase = 1)
*0x10205ed98*, pMT=0x100875cd1, newly marked: 1  // vtable for __Array<Object> + 1
*0x102056348*, pMT=0x10089a479, newly marked: 1  // __GCStaticEEType_01 + 1
*0x102056360*, pMT=0x1007f94c1, newly marked: 0  // vtable for System_Threading_Thread_System_Threading_Thread + 1
*0x1020507a0*, pMT=0x10089a479, newly marked: 1  //__GCStaticEEType_01 + 1
*0x102053550*, pMT=0x10089a441, newly marked: 1  //__GCStaticEEType_0 + 1
*0x102044698*, pMT=0x10089a489, newly marked: 1  //__GCStaticEEType_000 + 1
*0x10202a0d8*, pMT=0x10089a479, newly marked: 1  //__GCStaticEEType_01 + 1
*0x10202a118*, pMT=0x100819909, newly marked: 0  //vtable for S_P_CoreLib_Internal_Runtime_Augments_RuntimeThread + 1
*0x1020611a0*, pMT=0x10089a479, newly marked: 1  //__GCStaticEEType_01 + 1
*0x1020622b8*, pMT=0x1008802c1, newly marked: 1  //vtable for __Array<__Array<UInt8>> + 1
*0x10212bae8*, pMT=0x100877201, newly marked: 1  //vtable for __Array<UInt8> + 1
*0x102062088*, pMT=0x100877201, newly marked: 1  //vtable for __Array<UInt8> + 1
*0x102061338*, pMT=0x100877201, newly marked: 1  //vtable for __Array<UInt8> + 1
*0x10205ee78*, pMT=0x10089a479, newly marked: 1  //__GCStaticEEType_01 + 1
*0x102126148*, pMT=0x100880381, newly marked: 1  //vtable for __Array<__Array<Char>> + 1
*0x10205f010*, pMT=0x1008772a1, newly marked: 1  //vtable for __Array<Char> + 1
    GCHeap::Promote: Promote GC Root *0x1015a2848 = 0x10205ed98 MT = 0x100875cd0T

===

GC3:

*0x1020336b0*, pMT=0x100875cd1, newly marked: 1  //vtable for __Array<Object> + 1
*0x102032198*, pMT=0x10089a479, newly marked: 1  //__GCStaticEEType_01 + 1
*0x1020321b0*, pMT=0x1007f94c1, newly marked: 0  //vtable for System_Threading_Thread_System_Threading_Thread + 1
*0x1020206d0*, pMT=0x10089a4d9, newly marked: 1  //__GCStaticEEType_001 + 1
*0x102020720*, pMT=0x10081ea19, newly marked: 1
*0x102020740*, pMT=0x10087cb49, newly marked: 1
*0x102020830*, pMT=0x100833979, newly marked: 1
*0x1020207f8*, pMT=0x10080af69, newly marked: 1
*0x102007f38*, pMT=0x10089a479, newly marked: 1  //__GCStaticEEType_01 + 1
*0x102007f80*, pMT=0x10080d369, newly marked: 0  //vtable for S_P_CoreLib_System_Threading_ManagedThreadId + 1
*0x1020302a8*, pMT=0x10089a479, newly marked: 1
*0x10200fd28*, pMT=0x10089a501, newly marked: 1
*0x102031e88*, pMT=0x10089a441, newly marked: 1
*0x10200fd78*, pMT=0x10089a479, newly marked: 1
*0x10200fd90*, pMT=0x100813161, newly marked: 1
*0x1020bf2e0*, pMT=0x1008772a1, newly marked: 1
*0x10202efc8*, pMT=0x10089a489, newly marked: 1
*0x10200e0c8*, pMT=0x10089a441, newly marked: 1
*0x102008d10*, pMT=0x10089a479, newly marked: 1
*0x102008ff0*, pMT=0x100819909, newly marked: 0
*0x102035a58*, pMT=0x10089a479, newly marked: 1
*0x1020366b0*, pMT=0x1008802c1, newly marked: 1
*0x102033790*, pMT=0x10089a479, newly marked: 1
*0x1020bc6c8*, pMT=0x100880381, newly marked: 1
    GCHeap::Promote: Promote GC Root *0x1015a2848 = 0x1020336b0 MT = 0x100875cd0T
jkotas commented 6 years ago

the handle gets destroyed and recreated:

I have fixed this inefficiency in #6085. (I do not expect to fix it this issue though.)

Looks like it sets newly marked = 0 for the 3rd GC for ManagedThreadId.

Is there something in log for the 3rd GC that has newly marked = 1 for this ManagedThreadId?

christianscheuer commented 6 years ago

I rebased my branch off of newest tip including #6085 It doesn't fix the issue, but nice that it's not needed to re alloc. https://github.com/dotnet/corert/compare/master...christianscheuer:stresslog

You're right, there is a newly marked: 1 for the ManagedThreadId once in the 3rd GC :

It's during scanning of a simple method in the algorithm at 0x1003caed2, that just calls some other virtual/nonvirtual instance methods on the same class, but then has been instrumented with code like:

Log.WriteLine("PM.SavePage 1: ", Thread.CurrentThread.ManagedThreadId == Environment.CurrentManagedThreadId);

This is the output from the 3rd GC scan of that method. This precedes the newly marked: 0 that occurs for the thread statics handle.

Scanning method 0x1003caed2K
*0x1030321b0*, pMT=0x1007f9551, newly marked: 1 // vtable for System_Threading_Thread_System_Threading_Thread + 1
*0x103008ff0*, pMT=0x100819959, newly marked: 1 // vtable for S_P_CoreLib_Internal_Runtime_Augments_RuntimeThread + 1
*0x103007f80*, pMT=0x10080d3b9, newly marked: 1 // vtable for S_P_CoreLib_System_Threading_ManagedThreadId + 1
*0x103009098*, pMT=0x10080d471, newly marked: 1 // vtable for S_P_CoreLib_System_Threading_Lock + 1
*0x103009060*, pMT=0x1008750d1, newly marked: 1 // vtable for __Array<S_P_CoreLib_Microsoft_Win32_SafeHandles_SafeWaitHandle> + 1
*0x103009348*, pMT=0x10080ed89, newly marked: 1 // vtable for S_P_CoreLib_System_Threading_ManualResetEvent + 1
*0x1030090c0*, pMT=0x10081a5e9, newly marked: 1 // vtable for S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo + 1
*0x103008ff0*, pMT=0x100819959, newly marked: 0 // vtable for S_P_CoreLib_Internal_Runtime_Augments_RuntimeThread + 1
*0x103009110*, pMT=0x10080d871, newly marked: 1 // vtable for S_P_CoreLib_System_Threading_LowLevelMonitor + 1
*0x103009130*, pMT=0x100879d89, newly marked: 1 // vtable for __Array<S_P_CoreLib_System_Threading_WaitSubsystem_WaitableObject> + 1
*0x103009168*, pMT=0x10087a419, newly marked: 1 // vtable for __Array<S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo_WaitedListNode> + 1
*0x1030091a0*, pMT=0x10081ca69, newly marked: 1 // vtable for S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo_WaitedListNode + 1
*0x1030091d0*, pMT=0x10081ca69, newly marked: 1 // vtable for S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo_WaitedListNode + 1
*0x103009200*, pMT=0x10081ca69, newly marked: 1 // vtable for S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo_WaitedListNode + 1
*0x103009230*, pMT=0x10081ca69, newly marked: 1 // vtable for S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo_WaitedListNode + 1
*0x1030090c0*, pMT=0x10081a5e9, newly marked: 0 // vtable for S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo + 1
*0x1030090c0*, pMT=0x10081a5e9, newly marked: 0 // vtable for S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo + 1
*0x1030090c0*, pMT=0x10081a5e9, newly marked: 0 // vtable for S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo + 1
*0x1030090c0*, pMT=0x10081a5e9, newly marked: 0 // vtable for S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo + 1
*0x103009398*, pMT=0x1007f9ad9, newly marked: 1 // vtable for S_P_CoreLib_Microsoft_Win32_SafeHandles_SafeWaitHandle + 1

Disassembly for that method: https://gist.github.com/christianscheuer/91e6a664d6cd93b3e66a6c0f6c503b29

Complete GC3 log: https://gist.github.com/christianscheuer/b88a1f74eddbe7a9f771259889c6c1fe

jkotas commented 6 years ago

Everything seems to be marked correctly. I do not see any problem in the log.

export RH_HeapVerify=1 I'm not sure I'm seeing the effects of this.

This env variable is checked at runtime. It turns on extra validation in the runtime/GC (look for GetHeapVerifyLevel to see details).

Could you please try to run the compiled program with this env variable set under debugger, and set a breakpoint at WKS::gc_heap::verify_heap to validate that the heap verification is kicking in? It would be useful to know whether the heap verification is identifying any issues.

Also, if you are not doing it already - it is better to use Debug flavor of libRuntime for this. The Debug flavor of libRuntime has extra asserts that may help. (The other parts can be Release.)

christianscheuer commented 6 years ago

Thanks, @jkotas, I'll try to verify that.

Am I using debug flavor of libRuntime if I build CoreRT with ./build.sh debug skiptests and then point my IlcPath to the resulting ...corert/bin/OSX.x64.Debug/?

christianscheuer commented 6 years ago

Interestingly, can't seem to set a breakpoint at WKS::gc_heap::verify_heap without externally looking upnm and getting the address. Could indicate I don't have all debug symbols. However, I can confirm that my breakpoint at __ZN3WKS7gc_heap11verify_heapEj does not execute.

Maybe you can see in my steps what I'm doing wrong:

  • Build corert with ./build.sh clean debug skiptests
  • Run export IlcPath=corertdir/bin/OSX.x64.Debug/
  • Run dotnet publish -r osx-x64 in my project dir
  • Navigate to the publish folder
  • Run export RH_HeapVerify=1
  • Run lldb myexecutable
  • Set breakpoint (via nm)
  • Run in lldb, it doesn't hit.
jkotas commented 6 years ago

These steps look fine to me. Not sure why it does not work.

christianscheuer commented 6 years ago

I managed to get heap verification working by overriding int EEConfig::GetHeapVerifyLevel() to always return 1 and manually redefining the dprintf macros like this in gc.cpp:

#ifdef dprintf
#undef dprintf
#endif
#define dprintf(loglevel, printargs) printf printargs ;  printf("\n");

Also did a quick find/replace to replace all %I (capital i) with %l (lowercase L) since the %I was not recognized on macOS.

This produces a host of new data that I have no idea how to parse. Where do I look for heap verification errors?

This is the complete log: https://drive.google.com/open?id=1zCIUfqZpMLQEwStpIndemGaOLp1gQgAH

christianscheuer commented 6 years ago

FWIW, reading the memory at $rax right after the return of __GetThreadStaticBase...ManagedThreadId gives these two different outputs, the first one is read before any GC, the last one is read right after the 3rd GC.

Before GC: mem read $rax
0x103807f38: c8 64 89 00 01 00 00 00 80 7f 80 03 01 00 00 00  ?d..............
0x103807f48: 01 00 00 00 00 00 00 00 b8 8a 81 00 01 00 00 00  ........?.......

After 3rd GC: mem read $rax
0x1038078e8: c8 64 89 00 01 00 00 00 00 79 80 03 01 00 00 00  ?d.......y......
0x1038078f8: 00 00 00 00 00 00 00 00 b8 93 80 00 01 00 00 00  ........?.......

Point in disassembly:

sfbackend`S_P_CoreLib_System_Threading_ManagedThreadId__get_Current:
    0x1002858d8 <+0>:  pushq  %rbp
    0x1002858d9 <+1>:  subq   $0x10, %rsp
    0x1002858dd <+5>:  leaq   0x10(%rsp), %rbp
    0x1002858e2 <+10>: leaq   -0x10(%rbp), %rdi
    0x1002858e6 <+14>: movl   $0x4, %ecx
    0x1002858eb <+19>: xorl   %eax, %eax
    0x1002858ed <+21>: rep    
    0x1002858ee <+22>: stosl  %eax, %es:(%rdi)
    0x1002858ef <+23>: nop    
    0x1002858f0 <+24>: callq  0x10000a578               ; __GetThreadStaticBase_S_P_CoreLib_System_Threading_ManagedThreadId
->  0x1002858f5 <+29>: movl   0x10(%rax), %eax
    0x1002858f8 <+32>: movl   %eax, -0x4(%rbp)
    0x1002858fb <+35>: cmpl   $0x0, -0x4(%rbp)
    0x1002858ff <+39>: sete   %al
    0x100285902 <+42>: movzbl %al, %eax
    0x100285905 <+45>: movl   %eax, -0x8(%rbp)
    0x100285908 <+48>: cmpl   $0x0, -0x8(%rbp)
    0x10028590c <+52>: je     0x10028591f               ; <+71>
    0x10028590e <+54>: callq  0x1002859c4               ; S_P_CoreLib_System_Threading_ManagedThreadId__MakeForCurrentThread
    0x100285913 <+59>: movl   %eax, -0x10(%rbp)
    0x100285916 <+62>: movl   -0x10(%rbp), %eax
    0x100285919 <+65>: movl   %eax, -0xc(%rbp)
    0x10028591c <+68>: nop    
    0x10028591d <+69>: jmp    0x100285928               ; <+80>
    0x10028591f <+71>: movl   -0x4(%rbp), %eax
    0x100285922 <+74>: movl   %eax, -0xc(%rbp)
    0x100285925 <+77>: nop    
    0x100285926 <+78>: jmp    0x100285928               ; <+80>
    0x100285928 <+80>: movl   -0xc(%rbp), %eax
    0x10028592b <+83>: leaq   (%rbp), %rsp
    0x10028592f <+87>: popq   %rbp
    0x100285930 <+88>: retq   
    0x100285931 <+89>: nop    
    0x100285932 <+90>: nop    
    0x100285933 <+91>: nop    
christianscheuer commented 6 years ago

I'm starting to think it's not GC relocations, but the hijacking/restoring of stack and registers that's the problem.

I started to look back at the log. After we made sure everything's flushed, I noticed there's a single True evaluation after each GC, and only the first one after that returns False. see here: https://gist.github.com/christianscheuer/a8e8123d454dccb1c24766054b44f13a

I also noticed some weird debugging behavior that seems important. When after the 3rd GC in the first call to S_P_CoreLib_System_Threading_ManagedThreadId__get_Current stepping into the call at +24 and keep going instruction by instruction, all of a sudden I find myself returning at +54, which is when it then calls MakeForCurrentThread. If, however, I manually set a breakpoint at +29 and just let it continue instead of stepping into the +24 call, I can sometimes bypass the problem, keeping all values intact, even after the 3rd GC and after several calls. It can also happen that I don't return at all into the function, even after putting a breakpoint at +29 and hitting continue.

S_P_CoreLib_System_Threading_ManagedThreadId__get_Current:
    0x1002858d8 <+0>:  pushq  %rbp
    0x1002858d9 <+1>:  subq   $0x10, %rsp
    0x1002858dd <+5>:  leaq   0x10(%rsp), %rbp
    0x1002858e2 <+10>: leaq   -0x10(%rbp), %rdi
    0x1002858e6 <+14>: movl   $0x4, %ecx
    0x1002858eb <+19>: xorl   %eax, %eax
    0x1002858ed <+21>: rep    
    0x1002858ee <+22>: stosl  %eax, %es:(%rdi)
    0x1002858ef <+23>: nop    
    0x1002858f0 <+24>: callq  0x10000a578               ; __GetThreadStaticBase_S_P_CoreLib_System_Threading_ManagedThreadId
    0x1002858f5 <+29>: movl   0x10(%rax), %eax
    0x1002858f8 <+32>: movl   %eax, -0x4(%rbp)
    0x1002858fb <+35>: cmpl   $0x0, -0x4(%rbp)
    0x1002858ff <+39>: sete   %al
    0x100285902 <+42>: movzbl %al, %eax
    0x100285905 <+45>: movl   %eax, -0x8(%rbp)
    0x100285908 <+48>: cmpl   $0x0, -0x8(%rbp)
    0x10028590c <+52>: je     0x10028591f               ; <+71>
    0x10028590e <+54>: callq  0x1002859c4               ; S_P_CoreLib_System_Threading_ManagedThreadId__MakeForCurrentThread
    0x100285913 <+59>: movl   %eax, -0x10(%rbp)
    0x100285916 <+62>: movl   -0x10(%rbp), %eax
    0x100285919 <+65>: movl   %eax, -0xc(%rbp)
    0x10028591c <+68>: nop    
    0x10028591d <+69>: jmp    0x100285928               ; <+80>
    0x10028591f <+71>: movl   -0x4(%rbp), %eax
    0x100285922 <+74>: movl   %eax, -0xc(%rbp)
    0x100285925 <+77>: nop    
    0x100285926 <+78>: jmp    0x100285928               ; <+80>
    0x100285928 <+80>: movl   -0xc(%rbp), %eax
    0x10028592b <+83>: leaq   (%rbp), %rsp
    0x10028592f <+87>: popq   %rbp
    0x100285930 <+88>: retq   
    0x100285931 <+89>: nop    
    0x100285932 <+90>: nop    
    0x100285933 <+91>: nop 
christianscheuer commented 6 years ago

Well what I just described I am now unable to reproduce. Must have been a side effect of something else. Close to giving up here :/

christianscheuer commented 6 years ago

Okay, since our handle is now never changed, I was able to debug further.

By assigning the handle to $handle I could add display hooks on the following two expressions:

(void**)(((int64_t*)  (*(void**)$handle)  )[9]+0x10)
*(void**)(((int64_t*)  (*(void**)$handle)  )[9]+0x10)

This gives me the address of and the value of the thread id as stored inside the static base. The [9] is accessing the object storage[7] (it starts with MT and count), 7 is the typeTlsIndex for ManagedThreadId and then I add the 0x10 to get to the managed thread id Int64 value. So at all times when stopping in the code I can now verify if the handle->object[]->int works.

By then setting a watchpoint on the handle, eg. every time somebody updates the handle's value, I can check to see if after updating the handle, the two results will still evaluate correctly.

Using this approach I'm able to completely verify that during a succesful GC, inside of the plan_phase it gets changed (in Relocate), and it works again after. During an unsuccesful GC, after the plan_phase it will point to an object with 0x0 in its value, eg. the expressions will evaluate to (where normally the second would be a 0x1):

- Hook 5 (expr -- (void**)(((int64_t*)  (*(void**)0x11a932860)  )[9]+0x10))
(void **) $111 = 0x00000001028078f8

- Hook 6 (expr -- *(void**)(((int64_t*)  (*(void**)0x11a932860)  )[9]+0x10))
(void *) $112 = 0x0000000000000000
christianscheuer commented 6 years ago

The value at +0x0 instead of +0x10, which normally is __GCStaticEEType_01, is still __GCStaticEEType_01 when it fails. It seems it's just nulled out the actual int value. So could it be that it has an incorrect size for the static EE type and hence doesn't get the int copied with it?

christianscheuer commented 6 years ago

It seems this is actually the answer. By artificially increasing the size of the GCStaticEETypeNode by one pointer, the problem goes completely away.

In https://github.com/dotnet/corert/blob/master/src/ILCompiler.Compiler/src/Compiler/DependencyAnalysis/GCStaticEETypeNode.cs#L73 changing + 1 to + 2 solves it:

int totalSize = (_gcMap.Size + 2) * _target.PointerSize;

https://github.com/dotnet/corert/compare/master...christianscheuer:issue-6041

I don't know if that's the right solution of course, but it does get rid of the issue.

jkotas commented 6 years ago

Yes, this is the bug! I have submitted proper fix. Thanks a lot for tracing this down! Well done.

christianscheuer commented 6 years ago

Yay, and I can confirm your fix works here! :) Thanks for all the help, was a nice way to get to know the project a little deeper.