Closed srusecki closed 5 months ago
Setting affinity for the sshd seems to influence this issue. With default affinity set to 0-2 issue reproduces. With affinity set to just single core (0, 1, or 2), issue is not observed. That's how I set affinity
taskset -apc <affinity> <pid>
So far I've seen the following values in the invalid read address. They are all divisible by 4 and < 0x40: 0x04, 0x08, 0x10, 0x14, 0x1c, 0x24, 0x28, 0x2c, 0x30, 0x34, 0x38, 0x3c.
Full stack traces: crashes.txt
Using gdbserver I was able to grab the following information:
The invalid read exception is caused by the instruction
ld.as r17,[r16,-17]
with $r16 = 0x58
, $r16+4*-17 = 0x14
, the address is obviously invalid.
More context:
(gdb) bt
#0 sha256_block_data_order (ctx=0x400e9588, in=<optimized out>, num=<optimized out>)
at crypto/sha/sha256.c:344
#1 0x2022962a in SHA256_Final (
md=0x400e6500 "$\367\220\001D\237\221\321UH\274\326\177\375#\032\371\3321u\210Q\316h\310\035\376\001\315oF\215", c=0x400e9588) at include/crypto/md32_common.h:219
#2 0x2026f85a in sha256_internal_final (ctx=0x400e9588,
out=0x400e6500 "$\367\220\001D\237\221\321UH\274\326\177\375#\032\371\3321u\210Q\316h\310\035\376\001\315oF\215", outl=0x5f8dd5a8, outsz=<optimized out>) at providers/implementations/digests/sha2_prov.c:72
#3 0x201c9c40 in BASIC_CONSTRAINTS_free (a=<optimized out>) at crypto/x509/v3_bcons.c:43
Backtrace stopped: frame did not save the PC
and the assembly:
165292: 2044 2f10 and r16,r16,0x3c
165296: 2415 2451 add2 r17,r20,r17
16529a: 2055 2450 add2 r16,r16,0x11
16529e: 11ef a616 ld.as r22,[r17,-17]
1652a2: 2e43 2491 ror r17,r22,0x12
1652a6: 2e43 21d5 ror r21,r22,0x7
1652aa: 2000 2350 add r16,r16,r13
1652ae: 2e41 20d6 lsr r22,r22,0x3
1652b2: 2507 2455 xor r21,r21,r17
1652b6: 10ef a611 ld.as r17,[r16,-17]
From the above we see that the r16 value is truncated by the "and" operation so the only way for it to become a valid pointer is the addition of r13, however gdb reports that $r13 = 0x0
.
I gathered some debug information which hopefully helps. Linux was stopped in the following place and then the sshd crash was reproduced using the example script:
diff --git a/arch/arc/mm/fault.c b/arch/arc/mm/fault.c
index e16649811870..137ac5f56ce7 100644
--- a/arch/arc/mm/fault.c
+++ b/arch/arc/mm/fault.c
@@ -179,6 +179,7 @@ void do_page_fault(unsigned long address, struct pt_regs *regs)
si_code = BUS_ADRERR;
}
else {
+ __asm__("flag 1");
sig = SIGSEGV;
}
call_stack_core1.txt call_stack_core2.txt call_stack_core3.txt l2tlb_core1.txt l2tlb_core2.txt l2tlb_core3.txt smart_core1.txt smart_core2.txt smart_core3.txt
I took a look at the ftrace log which was stopped inside do_page_fault
function just before the bug occurred. There were calls to do_signal
and sys_rt_sigreturn
. These functions store/restore user registers, which is suspicious as we have r13
corrupted. I then looked at which regs are saved/restored and it's only r0
to r12
. Inside user_regs_struct
r13
is marked as callee register, which is inconsistent with the ARCv3 documentation which says that r13
should be caller-saved.
Can You comment on that?
Hi @srusecki, thanks for your debug info and you are right. Pavel also found this issue with r13 reg save/restore and we are on the way of fixing it. We will prepare a patch soon.
I moved the r13
to appropriate place and saved it correctly. The issue seems to be fixed.
@mroz1k also created a test app that reproduces it without the use of ssh
and it also works now.
Waiting for an official patch.
Here's the diff with changes I've made to move r13
into scratch registers region:
r13_scratch_diff.txt
Unfortunately this fix doesn't help for https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/163
And here is the simple program that reproduces the r13 corruption: r13-corruption-repro.zip, it must be compiled with -fzero-call-used-regs=all
The patch from @srusecki fixes the issue.
Is the patch from the other issue https://github.com/foss-for-synopsys-dwc-arc-processors/linux/files/15313249/regs.patch something official, so we can apply it to finally close the issue? Or you plan to deliver additional changes? We are planning SW release in 2 weeks, so would be good to have it applied sooner than later to have time to test it. Let us know what should we apply to close it.
Please use this regs.patch in your flow right now. This is not official patch since it can impacts other code in other projects and we are discussing on how to prepare official patch, it won't take too much time. I will let you know about official patch.
I checked that this patch works with gdbserver: arcv3_r13.patch
But you also need to change 'callee' to 'scratch' for r13 in case of using the patch and gdbserver. https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb/blob/arc-2022.09/gdbserver/linux-arc64-low.cc#L188C3-L188C69 and https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb/blob/arc-2022.09/gdbserver/linux-arc64-low.cc#L245
I checked that this patch works with gdbserver: arcv3_r13.patch
But you also need to change 'callee' to 'scratch' for r13 in case of using the patch and gdbserver. https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb/blob/arc-2022.09/gdbserver/linux-arc64-low.cc#L188C3-L188C69 and https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb/blob/arc-2022.09/gdbserver/linux-arc64-low.cc#L245
That is correct. I can tuck those changes in our ARCv3 gdbserver
as soon as our downstream linux
has the patch applied onto its arc64 branch.
Also the user_regs_struct of the ptrace.h
inside the linux-headers
of arc-gnu-toolchain
needs to be adjusted. Those headers are used for building glibc
and providing user space headers.
Hi @shahab-vahedi,
We have merged arcv3_r13.patch patch to arc64 branch. Could you fix also GDB? Thanks.
The changes are populated to
arc64 branch of binutils-gdb arc-2024.06 branch of bintuilts-gdb arc-2024.06-gdb branch of binutils-gdb master branch of arc-gnu-tool-chain
Hello @shahab-vahedi,
I see you applied the patch on 3 branches of binutils-gdb repository.
Previously we were using released binutils-gdb in our solution, tag arc-2023.09-release
.
Which one should we use now? Can you make a new release or we need to use a specific commit for now?
@srusecki we're currently finalizing work on arc-2024.06
release, which we plan to publish in late July. Given that fix will be included in arc-2024.06
I would suggest using exactly that upcoming release. Will it work for you?
Ok, we can wait for the release.
However, can you tell which branch we can use (arc64, arc-2024.06 or arc-2024.06-gdb) to check if the issue is fixed on our side?
@srusecki we'll use contents of arc-2024.06
branches for the release. At the time of release build generation we'll put git tags on corresponding commits so that the state is known and fixed.
Hello @shahab-vahedi,
I see you applied the patch on 3 branches of binutils-gdb repository. Previously we were using released binutils-gdb in our solution, tag
arc-2023.09-release
. Which one should we use now? Can you make a new release or we need to use a specific commit for now?
Hi @srusecki . The changes I've mentioned in binutils-gdb
(and arc-gnu-toolchain
) are about addressing the issue that gdb(server)
reports 0
for the r13
register. See this comment from @mroz1k .
The fix regarding your original issue (SSHD crashes) are in arc64 branch of our linux repository.
If you know how to build the toolchain and a vmlinux image, then feel free to use all the arc64
branches of the tools (binutils
, gcc
, glibc
, etc.).
Sometimes when sending commands to Linux over SSH from Python scripts,
sshd
process on Linux crashes with "Invalid Read" message.Minimal Python example:
After a while of sending
uname
commands in a loop thesshd
crashes. Linux shows the following message:Knowledge so far:
uname
is sent and the reply is simplyLinux
,This issue looks similar to https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/163 but is much easier to trigger.