Closed maoif closed 11 months ago
For background on x86
and x86_64
support vs. arm32
and ppc32
: around version 9 of Chez Scheme (plus version 8 starting from 8.9.5, if memory serves), it was rewritten to use the nanopass framework, and the authors at the time decided that x86
and x86_64
support would cover most use cases. With that decision, the first backends using the new compiler are somewhat opinionated toward x86_64
. The arm32
backend wasn't written until a few years later around 2014, and even though ARM has its own quirks, the backend retained a number of the x86_64
opinions where they weren't too troublesome.
I haven't worked with RISC-V, so I don't know best practices for some of your questions, but I know the arm32
and ppc32
backends pretty well, so hopefully I can provide some justifications for why they are the way they are.
(asm-enter)
is simply(values)
inarm32.ss
andppc32.ss
, but inx86.ss
andx86_64.ss
it adjusts%sp
. Why the difference?
This appears to be an artefact of x86
/x86_64
machine constraints around stack alignment. It was likely simpler to essentially stub the procedure for the RISC platforms than lift the conditional into cpnanopass.ss
.
- Comparison after locks: I noticed that after
(%inline locked-decr!)
and(%inline locked-incr!)
there isinfo-condition-code
, and(%inline lock!)
is wrapped in anif
-expression, so I have to set the condition code in these three primitives' assembler section? But RISC-V does not have condition flags, so currently I'm using one specific register for condition flags, and write the result of all carry/ovfl and comparisons in this register, and conditional branches are all based on this register. Don't know if it's OK?
I think there's more than one question here, so hopefully I can address them. For the %inline
expressions you're referencing, it seems that those happen in a pass of the compiler prior to instructions selection, so the if
that you see in reference to those primitives is not the same type as the if
that the RISC-V backend needs to handle.
For the condition flags, I think you can accomplish creating a RISC-V backend with a reserved register for condition codes, but I think it's probably unnecessary, and likely inadvisable. Again, I'm not experienced with RISC-V, so I'm speaking from the Chez Scheme end. Reserving a register for condition codes makes it unavailable to the register allocator, and will likely lead to less optimal code.
Based on my 5 minutes of Googling RISC-V conditional branches, it looks like you'll need to formulate conditional code in terms of one of the conditional jump instructions. Throughout the backend files, you'll see calls to make-tmp
--this will essentially reserve a register for the scope of that instruction. We call such reserved registers "unspillable," which refers to how the graph coloring register algorithm works, in that that location can't be moved to the stack (or spilled) within the context of that instruction (hence the u
prefix for the let
bindings of those variables). I would recommend using make-tmp
for instructions that use condition codes, which will likely allow the register allocator to make better choices.
- What does
asm-kill
do? It's just(define asm-kill (lambda (code* dest) code*))
This again has to do with the graph coloring register allocator. I started to put together a more full explanation, but for now, I'll have to leave the explanation at this: the compiler uses asm-kill
in places where a register is going to be used in a library call such that it wouldn't be detected in the analysis that determines which registers are available for assignment.
- Do I have to worried about the length of the data to be stored when
src
is `reg in asm-move?
I'm not sure I understand this question, mostly because I'm not sure if what you mean when you say "length" is what I would call "width." If that's the case, then generally speaking, the compiler should take care of fitting things into the width of the platform's registers based on the <machine-type>.def
file.
- The
pause
primitive usespause
instruction on x86,isync
on ppc, nothing on arm. On RISC-Vfence
andfence.i
seem to satisfy this?
The arm32
backend was written targeting primarily ARMv7 platforms (ARMv8 was in draft, IIRC), and at the time, we didn't find a useful instruction to use for pause
. ARMv8 has a yield
instruction that seems to fit the bill, but basically anything that's a hint to the processor to say "the current thread needs to wait and likely isn't going to do anything useful" should work. Or else do nothing, a la arm32
. I spent a minute or so looking at fence
for RISC-V, and it probably works, but I did see some wording that possibly implied some synchronization that may be unnecessary.
- Since RISC-V does not have so many addressing modes, is it OK to simply use
ur
and some immediate indefine-instruction
and rely oncoerce-opnd
to handle all other types of input?
Like with the case of reserving a register for condition codes, this would probably work, but will result in less optimal code. That is in fact a strategy I've used for bootstrapping on new systems just to get things running, but I add smarter cases later. Chez Scheme has traditionally not used a peephole optimizer, as I understand it in part because of the way define-instruction
allows for specifying special cases that help generate better code sequences ("better" meaning shorter or faster).
- Regarding addresses, I found that the backends either use a register or immediate field, but not both, don't know if it's always so?
I'm unsure about this one. My memory is that ARMv6 and ARMv7 (possibly ARMv8) support only one or the other, and not both, so that's why the backend for it is written that way. I don't remember if or why that's the case for ppc32
, but it's possible that nobody removed the restriction from arm32
, plus it seemed to be that way in the x86
backends, anyway. I don't know why the x86
platforms don't use mixed addressing modes, though.
- Regarding ffi, in
x86_64.ss
,(asm-foreign-call)
, we havefill-result-here?
as the output of(result-fits-in-registers?)
, which works on result type. But when it's#t
, the 1st argument is stored on stack, and from the context I suppose it's a pointer, because in(add-fill-result)
after the c-call finishes, the return value is stored into memory addressed by the pointer. So now the 2nd argument is put into the 1st argument register. How is the C function going to deal with this? In(asm-foreign-callable)
,fill-result-here?
becomessynthesize-first?
, the behavior is similar. I don't quite understand the logic behind this.
I spent quite a bit of time staring at this, too, but I figured it out. To start, this is explained in the documentation for the (& ftype-name)
return type for foreign procedures in section 4.2 of the user's guide:
(& ftype-name)
: The result is interpreted as a foreign object whose structure is described by the ftype identified byftype-name
, where the foreign procedure returns aftype-name
result, but the caller must provide an extra(* ftype-name)
argument before all other arguments to receive the result. An unspecified Scheme object is returned when the foreign procedure is called, since the result is instead written into storage referenced by the extra argument. Theftype-name
cannot refer to an array type.'
In asm-foreign-call
, and out of the context of an actual foreign call, this looks like the compiler munging the arguments to the C function. In fact, the C function isn't expecting that first argument at all--it's for the Scheme runtime's C code to use for returning an ftype
object from C back to Scheme. There are some other hints and references to this extra first argument in the definition of $make-foreign-procedure
in syntax.ss
.
I hope that helps answer most of your questions, except where my memory has failed or justifications have been lost to time.
For background on
x86
andx86_64
support vs.arm32
andppc32
: around version 9 of Chez Scheme (plus version 8 starting from 8.9.5, if memory serves), it was rewritten to use the nanopass framework, and the authors at the time decided thatx86
andx86_64
support would cover most use cases. With that decision, the first backends using the new compiler are somewhat opinionated towardx86_64
. Thearm32
backend wasn't written until a few years later around 2014, and even though ARM has its own quirks, the backend retained a number of thex86_64
opinions where they weren't too troublesome.I haven't worked with RISC-V, so I don't know best practices for some of your questions, but I know the
arm32
andppc32
backends pretty well, so hopefully I can provide some justifications for why they are the way they are.
(asm-enter)
is simply(values)
inarm32.ss
andppc32.ss
, but inx86.ss
andx86_64.ss
it adjusts%sp
. Why the difference?This appears to be an artefact of
x86
/x86_64
machine constraints around stack alignment. It was likely simpler to essentially stub the procedure for the RISC platforms than lift the conditional intocpnanopass.ss
.
- Comparison after locks: I noticed that after
(%inline locked-decr!)
and(%inline locked-incr!)
there isinfo-condition-code
, and(%inline lock!)
is wrapped in anif
-expression, so I have to set the condition code in these three primitives' assembler section? But RISC-V does not have condition flags, so currently I'm using one specific register for condition flags, and write the result of all carry/ovfl and comparisons in this register, and conditional branches are all based on this register. Don't know if it's OK?I think there's more than one question here, so hopefully I can address them. For the
%inline
expressions you're referencing, it seems that those happen in a pass of the compiler prior to instructions selection, so theif
that you see in reference to those primitives is not the same type as theif
that the RISC-V backend needs to handle.For the condition flags, I think you can accomplish creating a RISC-V backend with a reserved register for condition codes, but I think it's probably unnecessary, and likely inadvisable. Again, I'm not experienced with RISC-V, so I'm speaking from the Chez Scheme end. Reserving a register for condition codes makes it unavailable to the register allocator, and will likely lead to less optimal code.
Based on my 5 minutes of Googling RISC-V conditional branches, it looks like you'll need to formulate conditional code in terms of one of the conditional jump instructions. Throughout the backend files, you'll see calls to
make-tmp
--this will essentially reserve a register for the scope of that instruction. We call such reserved registers "unspillable," which refers to how the graph coloring register algorithm works, in that that location can't be moved to the stack (or spilled) within the context of that instruction (hence theu
prefix for thelet
bindings of those variables). I would recommend usingmake-tmp
for instructions that use condition codes, which will likely allow the register allocator to make better choices.
- What does
asm-kill
do? It's just(define asm-kill (lambda (code* dest) code*))
This again has to do with the graph coloring register allocator. I started to put together a more full explanation, but for now, I'll have to leave the explanation at this: the compiler uses
asm-kill
in places where a register is going to be used in a library call such that it wouldn't be detected in the analysis that determines which registers are available for assignment.
- Do I have to worried about the length of the data to be stored when
src
is `reg in asm-move?I'm not sure I understand this question, mostly because I'm not sure if what you mean when you say "length" is what I would call "width." If that's the case, then generally speaking, the compiler should take care of fitting things into the width of the platform's registers based on the
<machine-type>.def
file.
- The
pause
primitive usespause
instruction on x86,isync
on ppc, nothing on arm. On RISC-Vfence
andfence.i
seem to satisfy this?The
arm32
backend was written targeting primarily ARMv7 platforms (ARMv8 was in draft, IIRC), and at the time, we didn't find a useful instruction to use forpause
. ARMv8 has ayield
instruction that seems to fit the bill, but basically anything that's a hint to the processor to say "the current thread needs to wait and likely isn't going to do anything useful" should work. Or else do nothing, a laarm32
. I spent a minute or so looking atfence
for RISC-V, and it probably works, but I did see some wording that possibly implied some synchronization that may be unnecessary.
- Since RISC-V does not have so many addressing modes, is it OK to simply use
ur
and some immediate indefine-instruction
and rely oncoerce-opnd
to handle all other types of input?Like with the case of reserving a register for condition codes, this would probably work, but will result in less optimal code. That is in fact a strategy I've used for bootstrapping on new systems just to get things running, but I add smarter cases later. Chez Scheme has traditionally not used a peephole optimizer, as I understand it in part because of the way
define-instruction
allows for specifying special cases that help generate better code sequences ("better" meaning shorter or faster).
- Regarding addresses, I found that the backends either use a register or immediate field, but not both, don't know if it's always so?
I'm unsure about this one. My memory is that ARMv6 and ARMv7 (possibly ARMv8) support only one or the other, and not both, so that's why the backend for it is written that way. I don't remember if or why that's the case for
ppc32
, but it's possible that nobody removed the restriction fromarm32
, plus it seemed to be that way in thex86
backends, anyway. I don't know why thex86
platforms don't use mixed addressing modes, though.
- Regarding ffi, in
x86_64.ss
,(asm-foreign-call)
, we havefill-result-here?
as the output of(result-fits-in-registers?)
, which works on result type. But when it's#t
, the 1st argument is stored on stack, and from the context I suppose it's a pointer, because in(add-fill-result)
after the c-call finishes, the return value is stored into memory addressed by the pointer. So now the 2nd argument is put into the 1st argument register. How is the C function going to deal with this? In(asm-foreign-callable)
,fill-result-here?
becomessynthesize-first?
, the behavior is similar. I don't quite understand the logic behind this.I spent quite a bit of time staring at this, too, but I figured it out. To start, this is explained in the documentation for the
(& ftype-name)
return type for foreign procedures in section 4.2 of the user's guide:
(& ftype-name)
: The result is interpreted as a foreign object whose structure is described by the ftype identified byftype-name
, where the foreign procedure returns aftype-name
result, but the caller must provide an extra(* ftype-name)
argument before all other arguments to receive the result. An unspecified Scheme object is returned when the foreign procedure is called, since the result is instead written into storage referenced by the extra argument. Theftype-name
cannot refer to an array type.'In
asm-foreign-call
, and out of the context of an actual foreign call, this looks like the compiler munging the arguments to the C function. In fact, the C function isn't expecting that first argument at all--it's for the Scheme runtime's C code to use for returning anftype
object from C back to Scheme. There are some other hints and references to this extra first argument in the definition of$make-foreign-procedure
insyntax.ss
.I hope that helps answer most of your questions, except where my memory has failed or justifications have been lost to time.
Thanks for your answers, now I have succeeded in compiling the compiler, though ffi is not working(since I copied that from x86_64.ss
and ABI logic needs some change) and some errors occur when the cross-compiler is trying to compile files in examples/
, I list the files with the errors they result in below:
most common:
Exception in car: () is not a pair
fact.ss fatfib.ss fft.ss power.ss
Exception: failed assertion (null? unspillable*) at line 15413, char 32 of cpnanopass.ss
edit.ss unify.ss
Exception in bitwise-arithmetic-shift-left: #f is not an exact integer
freq.ss
Exception in car: riscv64-call is not a pair
queue.ss ez-grammar-test.ss
Exception in compiler-internal: find-home!: spilled unspillable #{ura g57a89gqrhvjqoqm0f1zac3s4-1}
This is quite tricky, I wonder if you guys have encountered these before when porting Chez Scheme?
Update: I inserted a bunch of (printf)
s in the backend and in cpnanopass.ss
, result is that the error occurred after select-instruction! pass. Still looking for errors in the instruction definitions...
For the <foo> is not a pair
and #f is not an exact integer
errors, those can occur when the ABI is incorrect, which you said is likely since it's copied from x86_64.ss
.
For the failed (null? unspillable*)
assertion and Exception in compiler-internal
, that indicates the register allocator is overly constrained. That can occur when there aren't enough registers listed in section 1 of the <backend>.ss
file, or too many calls to make-tmp
in section 2. It looks like RISC-V has 31 general purpose registers, which should be more than enough. I believe I've seen this happen before, but unfortunately I don't remember the specific cause or solution.
For the
<foo> is not a pair
and#f is not an exact integer
errors, those can occur when the ABI is incorrect, which you said is likely since it's copied fromx86_64.ss
.For the failed
(null? unspillable*)
assertion andException in compiler-internal
, that indicates the register allocator is overly constrained. That can occur when there aren't enough registers listed in section 1 of the<backend>.ss
file, or too many calls tomake-tmp
in section 2. It looks like RISC-V has 31 general purpose registers, which should be more than enough. I believe I've seen this happen before, but unfortunately I don't remember the specific cause or solution.
Thanks, now the instruction selection and register allocation can be done, but another problem occurs in the c-faslobj
procedure:
Exception in c-faslcode: wrote 232 bytes, expected 216 bytes
At first I thought the error was due to the 'quad
I set in asm-rp-header
, so I changed it to 'long
, but problem still exists, through the bytes written became a little less.
The asm-size
procedure always outputs 4, except when the input is riscv4-{abs, jump, call}
, just like in arm32.ss
, and the emit-code
procedure always constructs pairs with 'long
in the car field.
What else can give rise to the extra bytes?
Unfortunately, the only advice I can come up with is to try to get a trace of what happens in the (let prf0 ...)
loop in compile.ss
, using either the debugger or prints. I don't have specific evidence for it, but I suspect the value for ptr-bits
(defined in <machine-type>.def
) might not match your chip's pointer width. If that's true, it could account for the extra bytes. However, I would be a little surprised if you didn't have other errors earlier than this one.
Unfortunately, the only advice I can come up with is to try to get a trace of what happens in the
(let prf0 ...)
loop incompile.ss
, using either the debugger or prints. I don't have specific evidence for it, but I suspect the value forptr-bits
(defined in<machine-type>.def
) might not match your chip's pointer width. If that's true, it could account for the extra bytes. However, I would be a little surprised if you didn't have other errors earlier than this one.
Indeed this has something to do with ptr-bits
, though the value is right. The question is in asm-size
and asm-rp-header
. In the former quad
and abs
and code-top-link
were not considered, and were made to output 4 when should be 8; in the latter I used long
instead of quad
in the output pair. Therefore in compile.ss
bytes written are more than expected.
Now it seems the files in examples/
can all be compiled by running make boot XM=trv64le
, but make
still exits with error:
(time (for-each compile-file ...))
30 collections
0.666741002s elapsed cpu time, including 0.117468131s collecting
0.671828592s elapsed real time, including 0.118290646s collecting
258236432 bytes allocated, including 246627728 bytes reclaimed
>
make[2]: *** [Mf-cross:37: xboot] Error 2
make[1]: *** [Mf-boot:22: trv64le.boot] Error 2
make: *** [Makefile:50: boot] Error 2
I searched for errors earlier in the output and found:
Exception in compile-file: compiler for trv64le is not loaded
make[3]: *** [Mf-base:552: bootall] Error 255
make[3]: *** Waiting for unfinished jobs....
saying that trv64le
is not loaded. But I have set up both Mf-trv64le
and Mf-rv64le
, for threaded and unthreaded little-endian risc-v platform, files such as rv64le.def
and trv64le.def
also are right.
Another question: when running make boot XM=rv64le
without the "t", the error becomes
Exception in compiler-internal: find-home!: spilled unspillable #{ura4 bzlsondw1agi216ffb3fyayk4-0}
after some debugging, the error is in asmlibcall
. %ra
is declared as allocable, below is the code for asmlibcall
:
(define-instruction value (asmlibcall)
[(op (z ur))
(let ([u (make-precolored-unspillable 'ura4 %ra)])
(if (info-asmlib-save-ra? info)
(seq
`(set! ,(make-live-info) ,u (asm ,null-info ,asm-kill))
`(set! ,(make-live-info) ,z (asm ,info ,(asm-library-call (info-asmlib-libspec info) #t) ,u ,(info-kill*-live*-live* info) ...)))
(seq
`(set! ,(make-live-info) ,u (asm ,null-info ,asm-kill))
`(set! ,(make-live-info) ,z (asm ,info ,(asm-library-call (info-asmlib-libspec info) #f) ,u ,(info-kill*-live*-live* info) ...)))))])
Good, now the unthreaded version can be compiled. Almost all silly errors were due to incorrect uses of make-tmp
, without asm-kill
.
However, the threaded version cannot get compiled. Every time the make process comes to compile examples/
, it exits with error:
Exception in compile-file: compiler for trv64le is not loaded
But the unthreaded version can compile all of them. As far as I know, the only difference in the backend is that get-tc
, {activate,deactivate,unactivate}-thread
are used in FFI. Though the FFI code was copied from x86_64.ss
, I made some changes to make sure the registers used are all RISC-V version and there are no errors in unthreaded version.
Any ideas?
Are you building trv64le
in the same workarea as rv64le
? I haven't built any threaded version in a while, but I believe that they're typically built separately. There may be some values in Makefiles or definition files that's causing a problem. I would suggest creating a new workarea for trv64le
as a separate machine type and copying over files from rv64le
piecemeal. The risc-v.ss
(or whatever you've named the RISC-V backend file) should be the same, but you might need changes in other files.
Well the error was in the machine description: I forgot to change machine-type
in trv64le.def
, which was the same as in rv64le.def
, so it says compiler for trv64le is not loaded.
Now both versions of compilers can be compiled, and I moved the project in a riscv virtual machine running on QEMU, the C runtime can be compiled, but the boot file can't be loaded, with segfault. I debugged the boot file loading process and found that the error occurs in the following way:
During the first call in scheme.c: Sbuild_heap()
to load()
, after S_G.error_invoke_code_object
, S_G.invoke_code_object
and S_G.base_rtd
obtained their value, there is a while
loop that reads objects in the boot file.
Now, when it loops the 3rd time(i=3
), the predicate Sprocedurep(x)
becomes true, then boot_call() -> S_call_help() -> S_generic_invoke()
. Instructions in S_generic_invoke()
:
=> 0x2aaab11640 <S_generic_invoke+24>: ld a5,-32(s0)
0x2aaab11644 <S_generic_invoke+28>: addi a5,a5,65
0x2aaab11648 <S_generic_invoke+32>: ld a0,-24(s0)
0x2aaab1164c <S_generic_invoke+36>: jalr a5
the jalr
jumps to S_G.invoke_code_object
, the assembly(riscv) from which is
=> 0x3ff7c10160: add s3,zero,a0
0x3ff7c10164: ld a7,56(s3)
0x3ff7c10168: ld a6,48(s3)
0x3ff7c1016c: ld a5,40(s3)
0x3ff7c10170: ld a4,32(s3)
0x3ff7c10174: ld a3,24(s3)
0x3ff7c10178: ld a2,16(s3)
0x3ff7c1017c: ld a1,8(s3)
0x3ff7c10180: ld a0,0(s3)
0x3ff7c10184: ld s2,160(s3)
0x3ff7c10188: ld t0,136(s3)
0x3ff7c1018c: ld tp,176(s3)
0x3ff7c10190: ld t3,152(s3)
0x3ff7c10194: addi t3,t3,8
0x3ff7c10198: li t5,8
0x3ff7c1019c: sub t3,t3,t5
0x3ff7c101a0: auipc t5,0x0
0x3ff7c101a4: addi t5,t5,48
0x3ff7c101a8: sd t5,0(t3)
0x3ff7c101ac: jr 3(s2)
0x3ff7c101b0: 0x8
0x3ff7c101b2: unimp
0x3ff7c101b4: unimp
0x3ff7c101b6: unimp
where the ld
s are. if I'm right, code for restoring Scheme state. The following auipc
and addi
are for asm-return-address
. Register s2
contains the pointer to the closure obtained back in scheme.c: load()
(the x
in Sprocedurep(x)
above). Then, when after jr 3(s2)
takes the addr of the code from the closure, it jumps to:
=> 0x3ff7c65190: addi a2,sp,320
0x3ff7c65192: bnez a5,0x3ff7c6511a
which is garbage.
As a comparison, I GDBed the x86_64 boot file. Stll in the 3rd round in the while
loop in scheme.c: load()
, control passes to boot_call() -> S_call_help() -> S_generic_invoke()
, the asm when control transfers to Scheme is:
=> 0x40008150: sub $0x8,%rsp
0x40008154: mov %rdi,%r14
0x40008157: mov 0x10(%r14),%rsi
0x4000815b: mov 0x8(%r14),%rdi
0x4000815f: mov (%r14),%r8
0x40008162: mov 0x40(%r14),%r15
0x40008166: mov 0x28(%r14),%rbp
0x4000816a: mov 0x50(%r14),%r9
0x4000816e: mov 0x38(%r14),%r13
0x40008172: add $0x8,%r13
0x40008176: sub $0x8,%r13
0x4000817a: lea 0x28(%rip),%rcx # 0x400081a9
0x40008181: mov %rcx,0x0(%r13)
0x40008185: jmp *0x3(%r15)
almost the same. But after the last jmp
, the asm is meaningful:
=> 0x40008230: mov $0x3e,%rbp
0x40008237: jmp *0x0(%r13)
It just jumps to the addr stored in %sfp
. After the jump:
=> 0x400081a9: mov $0x1,%r10
0x400081b0: mov %r10,0x30(%r14)
0x400081b4: mov %rbp,0x28(%r14)
0x400081b8: mov %r9,0x50(%r14)
0x400081bc: mov %r13,0x38(%r14)
0x400081c0: add $0x8,%rsp
0x400081c4: movabs $0x5555555b6f1f,%rax
0x400081ce: jmp *%rax
the absolute addr is that of S_return()
, so control returns back to C. Thus the process is like:
1. S_generic_invoke() ->
2. code for invoke_code_object, 1st jump->
3. another jump(to called Scheme function?)->
4. back to invoke_code_object + jump back to S_return()
So some thing's wrong in step2, in the content of reg s2
. But this value is got from S_boot_read()
. I check my Scheme backend, there is no code for the garbage code above(which is disassembled as 2-byte compressed instruction, which I didn't implement; normal instructions are 4 bytes). Even the addr stored in %sfp[0]
is right, it contains the addr of step4... Help wanted😅 @akeep @cjfrisz
If I'm understanding this correctly, this is the first call into Scheme code from the C runtime. My guess is that you're either jumping to the address of start of the closure record instead of the code itself (i.e., not adding closure-code-disp
to the address) or there's otherwise a bad offset calculation.
That's just off the top of my head, so take that with a grain of salt. I hope that it turns out to be that simple. 😅
Firstly,
I see in other backends that the implementation of asm-conditional-jump
uses a big macro to generate code to dispatch on different comparisons, however in the RISC-V case there's no condition flags, so I chose a specific register(say %cond
) for it, and in the assemblers of <
, u<
, eq?
, logtest
, fl<
, etc., I manually set %cond
to 1 if the condition is true, 0 otherwise. For example:
(define-instruction pred (logtest log!test)
[(op (x ur) (y ur))
(values '() `(asm ,info-cc-eq ,(asm-logtest (eq? op 'log!test) info-cc-eq) ,x ,y))])
and asm-logtest
:
(define asm-logtest
(lambda (i? info)
(lambda (l1 l2 offset x y)
(Trivit (x y)
(values
(emit and `(reg . ,%cond) x y
(emit sltiu `(reg . ,%cond) `(reg . ,%cond) 1 ;; set less that immediate unsigned; if the last operand is 1, cond is set to 1 iff cond is 0
(emit xori `(reg . ,%cond) `(reg . ,%cond) 1 '())))
(let-values ([(l1 l2) (if i? (values l2 l1) (values l1 l2))])
(asm-conditional-jump info l2 l1 offset)))))))
My assumption is the newly set %cond
is immediately used by a following asm-conditional-jump
. From the boot file compiled it seems my hypothesis is right:
0x3ff7c0c6ec and s11,s1,s11 # a logtest: (not (zero? (and s1 x11)))
0x3ff7c0c6f0 xor t6,s11,s10
0x3ff7c0c6f4 seqz t6,t6
0x3ff7c0c6f8 beqz t6,0x3ff7c0c710
My understanding of the logic of big macro in asm-conditional-jump
is this:
;;normally:
;; [cond jump l1]
;; l2: # disp2 = 0
;; ...
;; l1:
;; ...
;;inverted:
;; [inverted cond jump l2]
;; l1: # disp1 = 0
;; ...
;; l2:
;; ...
;; generally:
;; [cond jump l1]
;; [jmp l2]
;; [other instructions]
;; l2:
;; ...
;; l1:
;; ...
Since cond branch only depends on whether %cond
is 1 or 0, I'm using just bne %cond, %zero, target
and beq %cond, %zero, target
for the normal and inverted case.
Secondly,
now the startup halts when registering foreign entries. I put a printf()
in S_foreign_entry()
and the output is:
foreign entry: (cs)sqrt
foreign entry: (cs)atan2
foreign entry: (cs)atan
Breakpoint 1, S_foreign_entry () at foreign.c:243
243 ptr tc = get_thread_context();
(gdb) c
Continuing.
foreign entry: (cs)sinh
Breakpoint 2, S_handle_nonprocedure_symbol () at schsig.c:437
437 ptr tc = get_thread_context();
(gdb) c
Continuing.
Call error: (1 3 $+ +)
[Inferior 1 (process 14653) exited with code 01]
... seems it's calling +
but found out that it is not a closure, so the handed coded nonprocedure-code
called handle-nonprocedure-symbol
. The code for nonprocedure-code
is
=> 0x3ff7c0c6e0: ld s1,5(t4) # 5 is symbol-value-disp
0x3ff7c0c6e4: li s11,7 # load imm closure mask
0x3ff7c0c6e8: li s10,5 # closue type
0x3ff7c0c6ec: and s11,s1,s11 # 3 instr for type-check
0x3ff7c0c6f0: xor t6,s11,s10
0x3ff7c0c6f4: seqz t6,t6
0x3ff7c0c6f8: beqz t6,0x3ff7c0c710 # the inverted cond jump
0x3ff7c0c6fc: mv s4,s1
0x3ff7c0c700: ld s11,3(s1)
0x3ff7c0c704: sd s11,13(t4)
0x3ff7c0c708: ld t5,3(s4)
0x3ff7c0c70c: jr t5
0x3ff7c0c710: sd a7,56(s0) # store Scheme states and jump to handle-nonprocedure-symbol
0x3ff7c0c714: sd a6,48(s0)
...
Now that it can proceed thus far and jump back and forth between C and Scheme, the linker is working fine, asm-return-address
calculates the right addr. What could cause the problem?
@cjfrisz BTW, I put a printf()
in the linker, riscv64_set_abs(void* address, uptr item)
to print the item
being relocated. In this way I get the value to be relocated and the addr of the instructions being relocaed. Then in the following code, via gdb, the lui, lui, addi, addi, slli, add
sequence produces the right value in t4
, and the last jr t5
jumps to the nonprocedure-code
in the last comment
0x3ff7c2e894: lui t5,0x0 # 0x3ff7c9354b
0x3ff7c2e898: lui t4,0xf7c93
0x3ff7c2e89c: addi t5,t5,64
0x3ff7c2e8a0: addi t4,t4,1355
0x3ff7c2e8a4: slli t5,t5,0x20
0x3ff7c2e8a8: add t4,t4,t5
0x3ff7c2e8ac: ld s4,5(t4)
0x3ff7c2e8b0: li t3,3 # t3 is %ac0
0x3ff7c2e8b4: ld t5,13(t4) # 13 is symbol_pvalue_disp
0x3ff7c2e8b8: jr t5
I wonder how can the right addr give me something so strange.
It's been a little while since I dug around in the linker, so I'd need to study that code before I'd have any deep insight.
The only thing that sticks out to me is this: did you set %cond
as a reserved register in define-registers
, or is it in the list of allocable
registers? If it's allocable, then you'll need to use asm-kill
in any define-instruction
that uses it. If it's reserved, it probably needs to be treated similarly to %sfp
and %ap
. I think the latter route may be kind of nasty in cpnanopass.ss
, so I'd personally allocate an unspillable for each instruction that uses it.
I think I'm taking a bit of a wild swing, but I think it's possible to get the kind of behavior you're seeing if %cond
isn't getting saved and restored properly.
It's been a little while since I dug around in the linker, so I'd need to study that code before I'd have any deep insight.
The only thing that sticks out to me is this: did you set
%cond
as a reserved register indefine-registers
, or is it in the list ofallocable
registers? If it's allocable, then you'll need to useasm-kill
in anydefine-instruction
that uses it. If it's reserved, it probably needs to be treated similarly to%sfp
and%ap
. I think the latter route may be kind of nasty incpnanopass.ss
, so I'd personally allocate an unspillable for each instruction that uses it.I think I'm taking a bit of a wild swing, but I think it's possible to get the kind of behavior you're seeing if
%cond
isn't getting saved and restored properly.
The greetings finally appears:
zachary@debian-rv64 ~/C/r/bin (new-linker)> uname -a
Linux debian-rv64 5.17.0-1-riscv64 #1 SMP Debian 5.17.3-1 (2022-04-18) riscv64 GNU/Linux
zachary@debian-rv64 ~/C/r/bin (new-linker)> ./scheme -b ~/petite.boot ~/scheme.boot
Petite Chez Scheme Version 9.5.7
Copyright 1984-2021 Cisco Systems, Inc.
(+ 1 2 3 4)
10
(map (lambda (x) (* x x)) (iota 10))
Exception: illegal instruction. Some debugging context lost
(fl+ 1.5 1.5)
0.0
Yet bugs still exist:
That’s great! It looks like there’s probably some issues with instruction encodings, plus maybe something not quite right in copying to and from floating point registers for floating point ops. Excellent job on getting the REPL to load!
That’s great! It looks like there’s probably some issues with instruction encodings, plus maybe something not quite right in copying to and from floating point registers for floating point ops. Excellent job on getting the REPL to load!
It's working:
zachary@debian-rv64 ~> uname -a
Linux debian-rv64 5.17.0-2-riscv64 #1 SMP Debian 5.17.6-1 (2022-05-15) riscv64 GNU/Linux
zachary@debian-rv64 ~> ChezScheme/rv64le/bin/rv64le/scheme -b ./petite.boot scheme.boot
Petite Chez Scheme Version 9.5.7
Copyright 1984-2021 Cisco Systems, Inc.
> ((lambda (x)
((lambda (y)
((lambda (z) (printf "~a~a~a~n" x y z)) "s")) "e")) "y")
yes
> (fl+ 1.2 1.2)
2.4
> (fl+ 1.2 1.2 3.4 5.12345)
10.923449999999999
BUT, I have no idea why. I just gave two machine-dependent regs two aliases and used them in the assembler. Still cannot bootstrap, there's an error:
compiling cmacros.ss with output to cmacros.so
compiling ../nanopass/nanopass.ss with output to nanopass.so
compiling ../nanopass/nanopass/language.ss with output to nanopass/language.so
compiling ../nanopass/nanopass/helpers.ss with output to nanopass/helpers.so
compiling ../nanopass/nanopass/implementation-helpers.chezscheme.sls with output to nanopass/implementation-helpers.chezscheme.so
Exception in close-port: failed on #<binary output port nanopass.so>: bad file descriptor
That’s great! It looks like there’s probably some issues with instruction encodings, plus maybe something not quite right in copying to and from floating point registers for floating point ops. Excellent job on getting the REPL to load!
It's bootstrapped (at least on my machine). Repo at: https://github.com/maoif/ChezScheme. Configure using ./configure -m=rv64le
.
Thanks for your help!
Currently the function of ffi is limited.
Please tell me what requirements should be met for it to get merged into the main branch.
Thanks again, @maoif, for your work on the RISC-V backend — now merged and passing all tests that I've tried on both emulated and real hardware.
I'm trying to port CS to RISC-V, by now I've finished most of the primitives, here are some problems I encountered:
(asm-enter)
is simply(values)
inarm32.ss
andppc32.ss
, but inx86.ss
andx86_64.ss
it adjusts%sp
. Why the difference?Comparison after locks: I noticed that after
(%inline locked-decr!)
and(%inline locked-incr!)
there isinfo-condition-code
, and(%inline lock!)
is wrapped in anif
-expression, so I have to set the condition code in these three primitives' assembler section? But RISC-V does not have condition flags, so currently I'm using one specific register for condition flags, and write the result of all carry/ovfl and comparisons in this register, and conditional branches are all based on this register. Don't know if it's OK?What does
asm-kill
do? It's justDo I have to worried about the length of the data to be stored when
src
is `reg in asm-move?The
pause
primitive usespause
instruction on x86,isync
on ppc, nothing on arm. On RISC-Vfence
andfence.i
seem to satisfy this?Since RISC-V does not have so many addressing modes, is it OK to simply use
ur
and some immediate indefine-instruction
and rely oncoerce-opnd
to handle all other types of input?Regarding addresses, I found that the backends either use a register or immediate field, but not both, don't know if it's always so?
Regarding ffi, in
x86_64.ss
,(asm-foreign-call)
, we havefill-result-here?
as the output of(result-fits-in-registers?)
, which works on result type. But when it's#t
, the 1st argument is stored on stack, and from the context I suppose it's a pointer, because in(add-fill-result)
after the c-call finishes, the return value is stored into memory addressed by the pointer. So now the 2nd argument is put into the 1st argument register. How is the C function going to deal with this? In(asm-foreign-callable)
,fill-result-here?
becomessynthesize-first?
, the behavior is similar. I don't quite understand the logic behind this.