Closed karelz closed 4 years ago
From @danmosemsft on August 29, 2017 22:32
@janvorli
Thanks for opening this in the right repo :)
@janvorli should this go to Tizen?
Hmmm... i doubt... main obective for this is to have powershell (.net core 2.0) running on raspberry pi 3... but if there's Tizen stuff that can help resolve this... perhaps, we can link Tizen people to this link...
@danmosemsft I will take a look myself. @SteveL-MSFT could you please provide me with steps or a pointer to steps on how to build powershell targeting ARM Linux and to repro the issue?
@janvorli you can clone https://github.com/stevel-msft/powershell/tree/raspberry-pi onto a Ubuntu16.04 box, install PSCore6, start powershell
, run ipmo ./build.psm1
, run start-psbootstrap -buildlinuxarm
, then start-psbuild -runtime linux-arm
or tomorrow I can give you ssh access to my pi on corpnet (it's a holiday in US today)
@SteveL-MSFT does the PSCore6 exist for 16.04 only? I have a 14.04 box, I was able to install powershell, but apt-get cannot find a package called PSCore6. I have thought that you might have meant powershell by that, but the ipmo command doesn't exist either.
Got @janvorli working
@SteveL-MSFT @janvorli wow! glad to know you got it working... was so looking forward to this... got a snapshot of the repo (is it in another branch?) that i can use? I'd like to run powershell on my pi also.
@whatevergeek just to be clear 'working' means I got him a repro of the crash locally so he can debug, not that we got PowerShell working on arm32 yet
I've debugged the issue and it is a codegen issue. The ResolveWorkerAsmStub expects to get indirection cell address combined with two flag bits in the register R4, but it gets an address of an argument shuffling thunk instead. The managed frame (the frame dotnet/coreclr#5 in the stack trace in the issue description above) is a frame of the following function:
DomainNeutralILStubClass.IL_STUB_SecureDelegate_Invoke(System.__Canon, System.__Canon, System.__Canon, System.__Canon, System.__Canon)
=> 0xa87e9a24: push {r2, r3, r4, lr}
0xa87e9a26: ldr.w lr, [sp, dotnet/coreclr#16]
0xa87e9a2a: str.w lr, [sp]
0xa87e9a2e: ldr.w lr, [sp, dotnet/coreclr#20]
0xa87e9a32: str.w lr, [sp, dotnet/coreclr#4]
0xa87e9a36: ldr r0, [r0, dotnet/coreclr#20]
0xa87e9a38: add.w r4, r0, dotnet/coreclr#16
0xa87e9a3c: ldr r4, [r0, dotnet/coreclr#12]
0xa87e9a3e: ldr r0, [r0, dotnet/coreclr#4]
0xa87e9a40: blx r4
0xa87e9a42: pop {r2, r3, r4, pc}
This function calls an argument shuffling thunk via the blx r4
. The thunk's code is below:
=> 0xb5b062b0: push {r4, r5, r6, lr}
0xb5b062b2: ldr.w r12, [r0, dotnet/coreclr#16]
0xb5b062b6: addw r4, sp, dotnet/coreclr#16
0xb5b062ba: addw r5, sp, dotnet/coreclr#16
0xb5b062be: mov r0, r1
0xb5b062c0: mov r1, r2
0xb5b062c2: mov r2, r3
0xb5b062c4: ldr.w r3, [r4], dotnet/coreclr#4
0xb5b062c8: ldr.w r6, [r4], dotnet/coreclr#4
0xb5b062cc: str.w r6, [r5], dotnet/coreclr#4
0xb5b062d0: str.w r12, [sp, dotnet/coreclr#12]
0xb5b062d4: pop {r4, r5, r6, pc}
This thunk replaces the LR
pushed by the first push by the value taken from [R0+16]
and so the pop at the end jumps to the following piece of code:
=> 0xb59b9f10: ldr.w r12, [pc, dotnet/coreclr#8] ; 0xb59b9f1c
0xb59b9f14: ldr.w pc, [pc] ; 0xb59b9f18
The values at the pc and pc + 8 are as follows:
(gdb) x/2dx 0xb59b9f18
0xb59b9f18: 0xb66f2ced 0x0000000c
So this piece of code jumps to 0xb66f2ced, which is the ResolveWorkerAsmStub
asm helper.
And now we are coming to the culprit. As I've already said, this asm helper expects R4
to contain the indirection cell address. But as you can see, the argument shuffling thunk didn't touch R4
and so we get the R4
that came from the DomainNeutralILStubClass.IL_STUB_SecureDelegate_Invoke
. And as you can see, R4
was used to jump to the argument shuffling thunk so it contains its address.
So I believe this is a JIT codegen bug. If you look at the generated code of the DomainNeutralILStubClass.IL_STUB_SecureDelegate_Invoke
, you can see that at 0xa87e9a38, the indirection cell address was loaded to R4
, but right in the next instruction, it was overwritten by the address that the blx
called a bit later.
cc: @dotnet/jit-contrib
Also, R4 is loaded as EA_PTRSIZE
in the line above. Instead, it should be loaded as EA_BYREF
.
@janvorli @jkotas great finding. I wonder how close are you to fix the root cause?
@mi-hol I am just building coreclr with a fix so that I can test it with powershell on my RPI3. So I think I will probably send out PR with the fix later today.
I have confirmed that the fix at the place that @jkotas has suggested fixes the powershell. It has started correctly and I've tried a couple of basic commands and they worked.
Fixed by dotnet/coreclr#13922
From @SteveL-MSFT on August 29, 2017 22:25
After building powershell with runtime linux-arm, it runs until it hits a second ManualResetEvent::WaitOne() call and results in SegFault. Stack trace from gdb:
Copied from original issue: dotnet/corefx#23660