BartmanAbyss / vscode-amiga-debug

One-stop Visual Studio Code Extension to compile, debug and profile Amiga C/C++ programs compiled by the bundled gcc 12.2 with the bundled WinUAE/FS-UAE.
GNU General Public License v3.0
314 stars 39 forks source link

Optimization settings for sample project would seem to be bad. #58

Closed rjobling closed 3 years ago

rjobling commented 3 years ago

I've been seeing some weird bugs that I can't attribute to any code, it's very hard to determine exactly what is going on. The bug seems to be some optimization problem causing my copper driven blitter lines to flicker.

I've been able to narrow the problem down to the use of -fwhole-program. If I turn that option off everything seems fine. While trying to investigate why this might be I found this documentation:

-fwhole-program Assume that the current compilation unit represents the whole program being compiled. All public functions and variables with the exception of main and those merged by attribute externally_visible become static functions and in effect are optimized more aggressively by interprocedural optimizers. This option should not be used in combination with -flto. Instead relying on a linker plugin should provide safer and more precise information.

The documentation says you should not combine fwhole-programe with -flto but that is what the sample makefile does so I'm doing the same.

I'll try to investigate a little more about this but wondered if you know more about gcc compiler options than I do and had some thoughts on what might be going on, or if you have suggestions for further narrowing the problem down?

rjobling commented 3 years ago

This is where I found the information about -fwhole-program

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

It later states: "If the program does not require any symbols to be exported, it is possible to combine -flto and -fwhole-program..."

So it all seems quite murky, but I'm having problems and would like to know more so any help would be appreciated.

Not sure if the sample makefile should use both or not, my project isn't doing anything particularly different since it's just some demo stuff.

BartmanAbyss commented 3 years ago

Hi, I‘d look elsewhere. These switches are needed or ld doesn‘t find the correct LTO library. Try turning on more warnings. When GCC finds undefined behavior (and it finds it quite often due to LTO), it generates anything it likes (including TRAP #7 opcodes)

Sent from my iPhone

On 10. Jan 2021, at 00:19, Jobbo notifications@github.com wrote:

 This is where I found the information about -fwhole-program

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

It later states: "If the program does not require any symbols to be exported, it is possible to combine -flto and -fwhole-program..."

So it all seems quite murky, but I'm having problems and would like to know more so any help would be appreciated.

Not sure if the sample makefile should use both or not, my project isn't doing anything particularly different since it's just some demo stuff.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

rjobling commented 3 years ago

I'm not sure what you mean. I turned off -fwhole-program and it compiles and runs without any problems. Are you saying that I can't expect that to always work? Certainly I would prefer to know what is the cause of my problems, so will keep looking.

rjobling commented 3 years ago

I've tried narrowing down the optimization switches. Turning off -ftree-loop-vectorize with -fno-tree-loop-vectorize has seemed to fix it. I notice the sample makefiles has -fno-tree-loop-distribution. Maybe the two are related somehow?

BartmanAbyss commented 3 years ago

When not disabling tree-loop-distribution, the compiler sometimes splits a for loop into 2 for loops, making it slower on 68000. I think I had to disable tree-loop-vectorize somewhere in gcc_support.c. I‘d rather not turn off LTO. That‘s by far the best optimization you can get both in terms of speed as well as size. Can you maybe PinPoint the corrupt code in the disassembly? Or try the no-tree-loop-vectorize in specific functions via attribute (see gcc_support.c)?

Sent from my iPad

On 10. Jan 2021, at 20:34, Jobbo notifications@github.com wrote:  I've tried narrowing down the optimization switches. Turning off -ftree-loop-vectorize with -fno-tree-loop-vectorize has seemed to fix it. I notice the sample makefiles has -fno-tree-loop-distribution. Maybe the two are related somehow?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

rjobling commented 3 years ago

You are disabling tree-loop-distribute-patterns in the memcpy code and others like it.

Not sure what that does.

To be clear I am finding that only disabling either whole-program OR tree-loop-vectorize fixes my issue. I suspect it has more to do with the later.

Also I still have lto enabled in both cases and the running code seems mostly unaffected.

Anyway I'll focus on tree-loop-vectorize and try to narrow it down so I will know if it's something I've done or if it's a more general issue what would suggest adding the no-tree-loop-vectorize option to the sample makefile.

rjobling commented 3 years ago

Jeez, now I have found that I don't get the issue if I never turn on warpmode for my initialization.

rjobling commented 3 years ago

I've tried lots of things that have made a difference but I still don't know why. I have debug asserts I've added to my code if I remove those it'll work. I also have KPrintF's and if I just remove those then it'll work. Since these are part of the allocation and initialization phase and that is when I use warpmode I'm starting to think my problem is some where in there. I compared a working and a broken pair of .s output files and there are very few differences. One are that might be different is the use of memset so I was going to try fiddling with the optimization options in there but I don't know how to add to the existing attribute options. I do notice that removing them causes memset to recurse out of control. Anyway maybe there are some clues in all that and you have some suggestions on how I might proceed? I did turn on all the "undefined behavior" warnings and I already have plenty more including Wall and Wextra. So I feel like the code is pretty solid. But clearly some quirk is tripping up something, at a bit of a loss what though.

rjobling commented 3 years ago

I am now suspecting that the problem might be due to my copper blits having one wait.

I tried all sorts of things and turned all the warnings for undefined behavior up and even changed some of the gcc support code to conform. None of that helped.

But random other things were helping and now I've tried adding another wait and that is helping. It might be coincidence. Maybe WinUAE doesn't exhibit the blitter wait bug so I'm just wasted more time. Anyway thought I'd update here just in case.

rjobling commented 3 years ago

For what it's worth this turned out to be a combination of bad copper setup and inconsistent timing.

I didn't have a vbl wait before starting the DMA which could invert my front/back copper lists.

The other issue was that I was calling KPrintF after starting the DMA and before the first frame. This caused the timing to be different enough that the issue was not showing up in VSCode when it would show up in the vanilla WinUAE install.

During my problem seeking I ended up reconfiguring the makefile so it'll report all sorts of undefined behavior and I also updated the support code so it conforms.

This allowed my to turn off the -O1 change for KPrintF.

I've attached my updated support code and makefile if you are interested.

Makefile.txt gcc8_c_support.c.txt

BartmanAbyss commented 3 years ago

Thanks!