Closed sgreenhill closed 4 years ago
Dear Stewart,
I was looking for a good way to get the address of the top of the stack, but I didn't know that under 64 bit OS an argument could be placed at address that unaligned into 64 bits. I thank you for this observation.
Most recently, I made a commit that fixed incorrect calculation of the top of the stack under Windows with a custom entry point (see Mod/Lib/crt1.c), that was used to reduce the size of the binary executable file.
Please try to change in SYSTEM.c
void SYSTEM_INIT(INTEGER argc, void *argv)
{
SYSTEM_MainStackFrame = &argc;
to:
SYSTEM_MainStackFrame = __builtin_frame_address(0);
If you can suggest some more correct way to calculate the aligned address of top of stack, please let me know.
P.S. I have my own mini-mac, but it is currently under repair.
I'm not quite sure that the function __builtin_frame_address
is in Clang...
So I am now more inclined to calculate top of stack address for Linux in the same way as in voc.
But it is likely that this non-aligned stack problem may be present under Windows x64 too. Can we use &argv
instead of &argc
?
Of course, "top of stack" here is a very conditional concept. The address just must be higher than any local pointer placed on the stack. All this is required for the garbage collector.
Thanks Oleg!
I can confirm that both &argv
and __builtin_frame_address(0)
work fine under Clang.
Looks like &argc
is just an unlucky choice, since it is 4-bytes wide and is therefore not aligned on a multiple of the pointer size.
The real bug is in the statement WHILE sp # stack0 DO
which assumes assumes that the stack top and bottom are aligned on a multiple of the pointer size. While this might usually be true, it probably can't be guaranteed.
$cat test.c
#include <stdio.h>
int main(int argc, char ** argv) {
printf("argc: %p\n", &argc);
printf("argv: %p\n", &argv);
printf("__builtin_frame_address: %p\n", __builtin_frame_address(0));
}
$gcc test.c -o test
$./test
argc: 0x7ffee868682c
argv: 0x7ffee8686820
__builtin_frame_address: 0x7ffee8686830
Ah, I chose &argc
because it is higher on the stack than &argv
and closer to the result of __builtin_frame_address(0).
But this is probably not the case on all platforms.
IF uLT(stack0, sp) THEN inc := -inc END;
WHILE sp # stack0 DO
This code implies working on architectures where the stack grows up instead of down. This is why the inequality sp # stack0
comparison was chosen here instead of the unsigned comparison on more or less.
I'm still wondering what we can do with it elegantly.
P.S. I am open for new suggestions on Ofront+, so we can discuss these questions here, and feel free to e-mail me: allot { at } bk.ru I'm not as conservative as Norayr, and I eventually want to get a convenient tool for developing in the real world, and not a beautiful, but useless thing.
You can see how binding to WinAPI for Ofront+ is done now:
https://github.com/Oleg-N-Cher/XDev/blob/master/WinDev/Lib/Mod/WinAPI.Def
If this were needed for voc, it would require a lot of very routine work.
Also please look at these bindings:
https://github.com/Oleg-N-Cher/NesDev/blob/master/Lib/Mod/NesLib.Mod https://github.com/Oleg-N-Cher/XDev/blob/master/ZXDev/Lib/Mod/Tasks.Def
A variety of ways are shown here.
One option is to split the code into two loops: one which increments sp while less than stack0, and one which decrements sp while greater than stack0. It wouldn't be elegant, but would be safer and just as fast.
Alternatively, round both stack0 and sp to multiples of the increment. For example, assuming positive sp, and that pointers have an alignment of "inc": sp := sp - sp MOD inc; stack0 := stack0 - stack0 MOD inc; Do this before the loop, and sp must eventually hit stack0 with the current test.
Thanks for your details. I will contact you if I have any issues. For now I am doing some experiments with ofront+ and have ported some hand-written interfaces. It should be easy to migrate from the "code procedure" to "extern declaration" model. From there I may look at adapting my translator (H2O) to make some more extensive automatic translations. It would be good to eventually support the same functionality in VOC. It don't think it should be too hard, especially with your implementation as a reference. I will study the compiler and try to understand how it works.
Looking at my collection of interfaces, I have a version of WinApi from 2002 which eventually became part of the VisualOberon project by Tim Teulings. The ZXDev bindings are certainly interesting. The "Def" modules are Oberon interfaces to the Z80 assembley codes? Can the C compiler generate code that is compact enough to be useful?There's not much space on some of those old machines. It makes me feel a bit nostalgic. My first job was writing 6510 assembley for the C64.
Dear Stewart,
I've already made the commit, and now I'm thinking about it and I'm starting to doubt it.
All the entire mechanism for searching for local pointers on the stack is based on the fact that all pointers are necessarily aligned. If there is at least one non-aligned garbage collected pointer on the stack, this mechanism will be broken.
Therefore, stack0
and sp
are always already aligned, because they are calculated by getting address of a pointer to a function's argument on the stack. This means that the additional correction is not necessary.
What do you think of this?
P.S. I use the extension .Def to indicate that these modules are not code modules, but only definitions (foreign interfaces).
Old machines really are very limited, but I use Ofront+ for developing simple games for retro-computers. Of course, these games never reach the level of games developed directly in assembler, but this is also a kind of hobby, don't you think? :-)
I have tried using the SDCC and cc65 compilers with Ofront+, and this tandem generates satisfactory code quality.
Here are a few links to these games:
https://zx.oberon.org/dash.htm https://github.com/Oleg-N-Cher/Dash https://zx.oberon.org/durak.htm https://github.com/Oleg-N-Cher/DarkWoods https://zx.oberon2.ru/forum/viewtopic.php?f=5&t=4&start=10#p1332
I think the rule is that objects must be aligned at a multiple of their size. This prevents them from overlapping word boundaries, which would otherwise make fetches inefficient. In the experiment below it seems to be true for both Clang-10, and gcc-9, although spacing is sometimes weird, which I can't explain. So maybe rather than rounding it is sufficient to just take the address of a pointer variable?
If pointers were not so aligned, Heap_MarkStack would not work at all as it is.
Excellent! Boulderdash was a favourite of mine. Never played Dark Woods, but it looks fun :-)
$cat test.c
#include <stdio.h>
void test(int a, char * p1, char c, char d, char * p2, char e, int f, char *p3, char *p4) {
printf("a: %p\n", &a);
printf("p1: %p\n", &p1);
printf("c: %p\n", &c);
printf("d: %p\n", &d);
printf("p2: %p\n", &p2);
printf("e: %p\n", &e);
printf("f: %p\n", &f);
printf("p3: %p\n", &p3);
printf("p4: %p\n", &p4);
}
int main(int argc, char ** argv) {
test(1, "one", '2', '2', "two", '3', 3, "three", "four");
}
$gcc test.c -o test
$./test
a: 0x7ffeef68c75c
p1: 0x7ffeef68c750
c: 0x7ffeef68c74f
d: 0x7ffeef68c74e
p2: 0x7ffeef68c740
e: 0x7ffeef68c73f
f: 0x7ffeef68c790
p3: 0x7ffeef68c798
p4: 0x7ffeef68c7a0
$gcc-9 test.c -o test9
$./test9
a: 0x7ffeec3c078c
p1: 0x7ffeec3c0780
c: 0x7ffeec3c0788
d: 0x7ffeec3c077c
p2: 0x7ffeec3c0770
e: 0x7ffeec3c0778
f: 0x7ffeec3c07a0
p3: 0x7ffeec3c07a8
p4: 0x7ffeec3c07b0
Josef Templ wrote:
Dear Oleg,
sorry for my late answer. I do remember that I also encountered problems with getting the stack base when switching C compiler optimizations on, depending on the platform. In general, using the address of a variable within 'main' was the only portable way I have found and using &argv avoids the need for an additional local variable. Using gcc specific features was not an option for me. Now, if the C compiler does very aggressive optimizations, it may inline 'main' and if argv is then no longer a local variable of 'main' but for example a global variable it lives at a completely different address outside of the stack and it is useless for obtaining the stack base address. This must be avoided and switching off compiler optimizations for the main module usually helps, as far as I remember.
- Josef
Oleg, your call of SYSTEM_INIT() in the macro __INIT in the file SYSTEM.h has a bug. Compare it to the ofront version and you will see that an & operator is missing. Note that in ofront the 'address' of the argv argument is passed, not its value. This is expressed by the parameter name argvadr. That explains the wrong stack base address.
Dear Josef, there is no bug. I did it on purpose.
Ofront uses the argv variable address from the function main(argv, argv). I can't do it the same way, because for reducing size of executable I'm using a custom version of crt1.c, and the entry point replaced from main() to WinMain() that has not such argument argv. So I have to simulate argv using a function __getmainargs. Look at this source and you'll understand everything:
• https://github.com/Oleg-N-Cher/OfrontPlus/blob/master/Mod/Lib/crt1.c
So now Ofront+ uses address not from argv of the main() function, but from argv of the SYSTEM_INIT:
void SYSTEM_INIT(INTEGER argc, void *argv)
Of course, this address is somewhat deeper in the stack, but it will still work as SYSTEM_MainStackFrame
. It is an address of a pointer, so it is aligned. I'm sure this will be a safe solution, since SYSTEM_INIT is called very close to the top of the stack:
int main(int argc, char **argv)
{
__INIT(argc, argv);
...
We still have a way to use the function __builtin_frame_address(0)
to get this address.
I will be happy to hear your opinion.
I hope I've explained everything clearly? My custom argv is a variable not on the stack. Because of this, I had problems with the garbage collector, which passed about 2 megabytes of stack and called SIGSEGV - because GC passed the stack from the bottom and not to the top of the stack, but to the area of local variables in the heap. So my entry point doesn't have argv as an argument passed on the stack. And I have to find another way. And I found a very similar working method - to use &argv of the SYSTEM_INIT.
Here's what my situation looked like with taking argv
address where argv
was a variable on the heap, not on the stack:
void SYSTEM_INIT(INTEGER argc, void *argvadr)
{ int adr;
SYSTEM_MainStackFrame = argvadr; printf("argvadr = %d\n", (SYSTEM_ADRINT)argvadr);
SYSTEM_MainStackFrame = __builtin_frame_address(0);
printf("__builtin_frame_address = %d\n", (SYSTEM_ADRINT)__builtin_frame_address(0));
printf("&adr = %d\n", (SYSTEM_ADRINT)&adr);
By the way, for completeness: we could take the address of the local variable adr
declared in SYSTEM_INIT as SYSTEM_MainStackFrame
. But &adr
is even deeper on the stack than &argv
of SYSTEM_INIT
.
You must be very careful when selecting a stack base address because if you miss a single pointer candidate the garbage collector may crash your program. Therefore you must use an address outside of SYSTEM_INIT, i.e. from the caller of SYSTEM_INIT. Then all potential pointer candidates are covered. It is not sufficient to be somewhere close to the stack base, you must be there. You can use a local variable in your WinMain function if you don't have any parameter available.
What do you think about this decision?
int main(int argc, char **argv)
{
__INIT(argc, argv, __builtin_frame_address(0)); // <-- No risk of a missed pointer,
// because no one GC-pointer has been reserved in stack at this place, main() has not local variables at all.
// All pointers (as local variables and as arguments) will be reserved later.
void SYSTEM_INIT(INTEGER argc, void *argv, void *stackframe)
{
SYSTEM_MainStackFrame = stackframe; // MUST be aligned
SYSTEM_ArgCount = argc;
SYSTEM_ArgVector = argv;
By the way, what are the reasons to avoid using the function __builtin_frame_address
? Stewart found out that it works on macOS too.
I certainly wouldn't want to overcomplicate code generation with a call __INIT(argc, argv, __builtin_frame_address(0))
But what else can I do?
In principle, it can be written in SYSTEM.h as the macro:
#define __INIT(argc, argv) static void *m; SYSTEM_INIT(argc, argv, __builtin_frame_address(0))
This commit takes into account the expressed wishes. For UNIX systems, the address is taken in the old, proven way, by &argv of main(). And for WinMain entry point, we use the function __builtin_frame_address(0).
That sounds good to me. Are you using GCC on Windows, or does the Microsoft compiler also implement __builtin_frame_address?
Yes, I use MINGW.
I have just found out that using function __builtin_frame_address has not any dependencies on WinAPI libraries, from which I conclude that this function unwraps into several in-line machine code commands without external dependencies.
I found some links for an analog of the function __builtin_frame_address from Microsoft.
• https://stackoverflow.com/questions/1291446/visual-c-versions-of-gcc-functions • https://docs.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2008/s975zw7k(v=vs.90) • https://docs.microsoft.com/en-us/cpp/intrinsics/returnaddress?view=vs-2019
__builtin_frame_address is a GCC specific function or macro, as far as I know. That's why I would try to avoid it. The C standard dos not have such a feature. On some compiler it may not be available.
Well, in this case we use this function exclusively for a specific usage of the WinMain entry point.
If a compiler uses the main() entry point, the code with __builtin_frame_address will not be used. And as far as I understand, Ofront generally only uses the main() entry point.
I think I can close this issue.
Hi,
I'm just trying ofront+ for the first time, using the current github version. It works fine until it needs to expand the heap, and then segfaults. Programs using less than about 200K of heap work fine. Same program runs fine under VOC.
Easiest way to reproduce the problem is:
Compiled with:
where
$OCFLAGS
are the paths to the library directories. Same result if compiled (like the compiler itself) like this:gcc $OCFLAGS -m64 -g -fvisibility=hidden -fomit-frame-pointer -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables TestHeap2.c -o TestHeap2 -lOfront -dl
Looking at the trace below, I think the problem is in Heap.MarkStack:
The most likely way this can go wrong is if
inc
has the wrong sign, or if the stack frames are not perfectly aligned. This would allowsp
to move paststack0
without triggering the testsp # stack0
. Probably would be safer to do an unsigned comparison at this point. For example,uLE(sp,stack0)
oruGE
depending on the stack direction.From below, you can see that the stack is between
0x00007ffeefbff764
and0x00007ffeefbeb4f0
, and its testing a location "sp" outside this range:0x7ffeefc00000
. To me it looks suspicious that SYSTEM_MainStackFrame ends with a 4 when the alignment size is probably 8. Is it possible this value could be slightly wrong? Perhaps it should be forced to an alignment boundary?