Closed abrodkin closed 1 year ago
For the record, the same is easily reproduced for ARCv2 w/o 64-bit loads/stores (i.e. w/o -mll64
- in case of 64 bit loads/stores instead of calls to memcpy()
data gets moved in place with ldd
/std
instructions):
vm_area_dup:
acc: c0f1 push_s» %blink
ace: c6e1 push_s» %r14
ad0: c5e1 push_s» %r13
ad2: 24823504 sub» %sp,%sp,0x114
ad6: 41c3 00000cc0 mov_s» %r1,0xcc0» » ; 0xcc0 = dup_mm.isra.0+0x180
adc: 4608 mov_s» %r14,%r0
ade: 16007000 00000000r ld» %r0,[0]»» ; vm_area_cachep
ae6: 08020000r bl» kmem_cache_alloc
aea: 250a9000 mov.f» %r13,%r0
aee: f220 bz_s» 0xb2c = vm_area_dup+0x60
af0: 41c1 mov_s» %r1,%r14
af2: da5c mov_s» %r2,92
af4: 08020020r bl.d» memcpy
af8: c097 add_s» %r0,%sp,92
afa: c197 add_s» %r1,%sp,92
afc: da5c mov_s» %r2,92
afe: 08020020r bl.d» memcpy
b02: 4083 mov_s» %r0,%sp
b04: 4183 mov_s» %r1,%sp
b06: da5c mov_s» %r2,92
b08: 08020020r bl.d» memcpy
b0c: 245635c0 add3» %r0,%sp,23
b10: da5c mov_s» %r2,92
b12: 245635c1 add3» %r1,%sp,23
b16: 08020020r bl.d» memcpy
b1a: 40a1 mov_s» %r0,%r13
b1c: 25561202 add3» %r2,%r13,8
b20: a550 st_s» %r2,[%r13,64]
b22: a551 st_s» %r2,[%r13,68]
b24: 1d0c1001 st» 0,[%r13,12]
b28: 1d081001 st» 0,[%r13,8]
b2c: 40a1 mov_s» %r0,%r13
b2e: 24803504 add» %sp,%sp,0x114
b32: 1408301f ld» %blink,[%sp,8]
b36: c5c1 pop_s» %r13
b38: 7fe0 j_s.d» [%blink]
b3a: 1408340e ld.ab» %r14,[%sp,8]
b3e: 78e0 nop_s
Pre-built ARC GNU toolchain 2021.09:
arc-elf32-gcc --version
arc-elf32-gcc (ARCompact/ARCv2 ISA elf32 toolchain - build 965) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
An observation: Each of the 3 extra memcpy()
s are using the same addresses as source and destination.
@shahab-vahedi well looking at real execution trace that's what I may reconstruct:
memcpy(dest = 0x81057da4, src = 0x812f82e0, size = 0x5c = 92)
memcpy(dest = 0x81057d48, src = 0x81057da4, size = 0x5c = 92)
memcpy(dest = 0x81057e00, src = 0x81057d48, size = 0x5c = 92)
memcpy(dest = 0x812f8d4c, src = 0x81057e00, size = 0x5c = 92)
So it's quite an interesting arrangement ;)
I do not understand what is the issue here. I do not see any issue related with to the compiler. The vm_area_dup
duplicates some structures, thus, making use of memcpy
routine. Why the authors of this vm_area_dup
does so, I wouldn't know, and the best is to ask them why.
I'll close it down. Please reopen it if you see any issue with the compiler.
Consider vm_area_dup() function in the Linux kernel (https://elixir.bootlin.com/linux/v5.16/source/kernel/fork.c#L354):
The real "meat" here is
*new = *orig;
which basically duplicates contents of onevm_area
structure to another with help ofmemcpy()
. But that's how it works ifdata_race()
macro (see how it's implemented here https://elixir.bootlin.com/linux/v5.16/source/include/linux/compiler.h#L214) is removed. But in its presence for some reason 3 extramemcpy()
invocations appear, what's more, they all act on different (though located one after another) buffers of the same size.That's what I see in disassembly w/o the macro:
And that's with the macro:
Any ideas on what's going on here? That's especially interesting if we read a comment for the macro:
To reproduce that problem outside the Linux source tree use attached
fork.i
and compile it witharc64-linux-gcc -mcpu=hs5x -c -O3 -o fork.o fork.i
, then inspect the body ofvm_area_dup()
.fork.zip