breakintoprogram / agon-vdp

Official AGON QUARK Firmware: ESP32 VDP
MIT License
76 stars 27 forks source link

mode 0 display corruption #75

Closed stevesims closed 1 year ago

stevesims commented 1 year ago

With the latest code update with the revised screen modes I'm seeing a display corruption in mode 0. Once the display has filled sufficiently to mean that the cursor has reached the bottom of the screen, only the upper half of the input line will get rendered. Further scrolling results in lines with their lower halves empty.

The following one-liner in BASIC will demonstrate:

MODE 0: FOR I = 0 TO 100: PRINT "Here is some text": NEXT

On further investigation, I can see the same corruption occurred in mode 3 before the new screen modes landed.

IMG_7117 Medium

breakintoprogram commented 1 year ago

Hi, thanks for the feedback. I'm unable to replicate here.

stevesims commented 1 year ago

Odd... I've not yet updated my BASIC and MOS to the latest versions, but I'm running the latest VDP code.

I'll try to update MOS and BASIC and get back to you

stevesims commented 1 year ago

OK, I've updated my BASIC to 1.05, and I'm still seeing the same display corruption.

I've not managed to update my MOS yet, so I'm running the 1.04 RC 1 release that was added to the repo on 10th July.

The issue I have is building a MOS.bin file suitable for use with the agon-flash tool. ZDSII seems to build the project just fine, but it doesn't produce a bin file, and there's nothing I can see in any of the documentation on how to accomplish that step.

breakintoprogram commented 1 year ago

Odd... It's unlikely that MOS or BASIC are at fault here, more than likely a VDP issue. I'll keep this card open and will keep an eye out for any more reported incidents.

stevesims commented 1 year ago

I've spent quite a while this morning looking into this a bit deeper, and have added a photo that illustrates what I'm seeing.

Now this is complex, so please bear with me. In the process of writing this comment I've gone thru a lot of combinations...

I had noticed this corruption when running my own audio-enhancements branch of the code with the DEBUG flag enabled in video.ini so that I can see output on the arduino serial monitor. My investigations however show this corruption is not limited to my branch (which contains no changes to the video code), or the presence of the DEBUG flag.

Running the current main branch, an initial run of this test works fine, Swap to mode 3 and then back to mode 0 though and I see the display corruption. Mode 3 itself works OK. From that point onwards mode 0 consistently displays corruption for me.

Enable the DEBUG flag on main, and for me this test consistently works fine in mode 0, but swapping to mode 3 causes trouble and I see the corruption. Swapping back to mode 0 and it's back to working OK.

Further observation shows that the screen mode is not actually always changing properly. This can be seen by looking at the text size, and whether corruption can be observed. If I change to mode 3 I will see the corruption; from there swapping from there to mode 1 does not result in me seeing smaller text - it remains the same size and the corruption remains. Swap to mode 0, and the corruption goes away and the text size changes to be smaller. From mode 0, swapping to mode 1 results in the text size staying the same, without the corruption.

I tried out a lot of different screen modes...

Finally, swapping to any/all of the double-buffered screen modes results in a corrupted display, but with a different manner of corruption. This is an example showing mode 129: IMG_7119 Medium

This is consistent whether I'm running with the DEBUG flag enabled or not.

julianregel commented 1 year ago

I've seen the display corruption (as you describe at the top of the issue) as well, but not something I could reproduce at will. However, having tried just now, I've been able to reproduce it.

My test program:

10 PRINT "HELLO"
20 GOTO 10

Steps I use to reproduce:

I've tried the same sequence by using modes 3, 4 and 5. The same problem and mode change behaviour occurs when running in mode 3.

I'm not sure if this is a clue, but mode 0 and mode 3 are both "big" in terms of memory: 640x480x4bpp and 640x240x8bpp both take 150K. Could we be hitting some memory issue?

With regards to the double buffering corruption, I've seen that also, but assume it's because the memory for frame buffers are initialised but not zeroed. In my testing, I select the double buffered mode, CLS, swap buffers, CLS and swap buffers back.

@stevesims, please could you take a look at the two BASIC programs here: https://github.com/julianregel/agonprograms/tree/main and confirm they both work as expected? Thanks.

stevesims commented 1 year ago

hey @julianregel - just tried out your programs. first gotcha was BASIC reporting "Bad program", which led me to discover that BASIC needs CRLF line endings, and the source as LF only. :grin: another gotcha is that it seems BASIC's GET/GET$ captures escape key presses, whereas a Beeb would drop out of the program... that's once I'm finished writing this comment I'll file some bug reports against the BASIC repo about those...

MODELIST seems to work OK. A couple of the screen modes don't stretch to the bottom of the screen (0 and 3).

DBMODETEST is kinda problematic. Here's what I saw: Mode 129: corrupt screen - swapping between buffers didn't result in any visible change. Mode 130: initial run works fine - can swap between buffers. Second run and the buffer swap doesn't seem to work - I end up with both lines of text visible. Mode 131: buffer swap seems to fail on initial run, as I see both lines of text. Mode 132: corrupt screen, but more colourful than mode 129 :grin: Mode 133: same behaviour as mode 130 Mode 134: same as 130 Mode 135: same as 131 Mode 136: black screen Mode 137: initial run works fine. Second run screen mode doesn't seem to change (text stays at mode 3 size), and both lines of text visible. Mode 138: same as mode 137 Mode 139: same as mode 137 Mode 140: black screen Mode 141: same as mode 137 Mode 142: same as mode 137 Mode 143: same as mode 137

Some further investigation and looking at the VDP serial monitor tends to show that the buffer swap will fail after the VDP has reported set_mode: error 2. It seems to do that on entering mode 3, as well as the above "second run" failures. Changing to mode 1 before performing another run fixes that, letting those subsequent runs work correctly (with buffer swapping).

NB I'm currently running my own custom VDP code, with DEBUG enabled, but the only changes are enhanced audio functionality, all the video handling stuff is identical to main. In principle the memory footprint on my branch should be very similar to main - I can't imagine it using up much more internal RAM, worst case maybe 1kb (I haven't done the maths and/or added in appropriate debugging statements to work that out tho).

stevesims commented 1 year ago

I was wrong about BASIC's handling of GET/GET$...

What I'm actually seeing is that on pressing escape, text output is no longer working. I don't see the escape reported, nor the BASIC prompt, altho the prompt is there. Attempts to PRINT fail to be visible, and a call to CLS results in no change. a MODE 1 call however will successfully change my screen mode.

stevesims commented 1 year ago

I've had a chance to investigate this issue a bit deeper, and now have a slightly better understanding of the VDP code that handles mode changes

attempting to change the screen mode causes the set_mode function to be called, which in turn calls change_mode which calls the underlying change_resolution. it's in change_resolution that calls to fab-gl/vdp-gl are made change the display, and in there you'll find a check for whether there was enough memory for the screen resolution. if there isn't, it returns 2 indicating that error, which bubbles up to the set_mode function.

set_mode is written to detect such errors, and will call change_mode with the last (successful) screen mode number, attempting to reset back to a "known good/working" mode.

unfortunately this doesn't actually revert because there's a check in change_mode that stops the call to change_resolution from happening if you're changing to what it thinks is the current screen mode. you'll actually be left in the "insufficient memory" screen mode (the VDP is left in kinda a half-way state).

The visual corruption happens because in such a mode the Canvas object created for the screen has a shorter height than expected (as there's not enough memory for the requested resolution), and that height is not a multiple of the font height; the text scroll routine causes the corruption. If the height happens to be a multiple of the font height you won't see any corruption, just a blank area at the bottom of the screen.

There's a few things to fix here.

Firstly the code that attempts to revert the screen mode needs fixing. I have an experimental fix that adds an optional force argument to the change_mode function, allowing the check after the cls() call in there to become:

    if (force || (mode != videoMode)) {

That will allow the screen mode to be restored to the last known working mode, as was the intent of the original code.

Secondly, and much more difficultly, we need to do something about memory usage so we can help ensure that all the screen modes work.

julianregel commented 1 year ago

That’s an excellent investigation @stevesims. Thanks for digging into it. I go on holiday for a week and you identify the source of the problem! I've added a comment to PR #80.

stevesims commented 1 year ago

Thanks @julianregel

My improved understanding is the result of a significant refactor I'm most of the way through of the codebase - splitting functionality out into separate files to help make things more manageable. I'll probably raise a PR with that refactor soon (maybe later today or tomorrow), altho that may take a little jiggering around as my starting point was my "audio enhancements" version of the code.

(I anticipate some further refactoring to come - making use of C++ a bit better, as I learn it, and hopefully improving memory usage.)

julianregel commented 1 year ago

Do you have any sense of how much free memory there is on the ESP32? I've certainly had 320x240 in 64 colours working with double buffering correctly, but as you've demonstrated, it is now problematic. I'm wondering if free memory is borderline and a few other additions to our respective code bases is crossing a threshold.

stevesims commented 1 year ago

I've not worked out yet how best to try to instrument the code to see how memory is being used - so far I'm just whacking in debug log statements coupled with calls to heap_caps_get_free_size. That and looking to see how much memory the Arduino build tools say is being used by global variables.

This stuff is currently somewhat outside of my skillset - the past couple of decades I've mostly been working on front-end web applications. Hopefully there's tooling out there of the same kind of sophistication as I used to use 20 years ago when did some desktop application development. 😀

stevesims commented 1 year ago

Subsequent to this, I have investigated this further, and discussed the matter at some length on the Agon Programmers Discord.

The underlying issue, as mentioned above, is to do with a lack of available internal RAM on the ESP32. Investigations show that this is exacerbated by the VDP code keeping hold of instances of all the possible VGA controllers it may need, which in turn will keep chunks of memory and not release that memory between screen mode changes.

The current main branch has a bug (fixed by #80) that will mean that attempts to change into a screen mode for which there's insufficient memory will result in you being left in that screen mode, (although the VDP code kinda thinks it's reverted the user back to the last working mode). As there's insufficient memory you get left with a chunk of blank rows at the bottom of the screen.

Before the new screen modes were added it took a little longer for this issue to kick in, essentially because of the initial screen mode the VDP starts up in. you could successfully swap into the old mode 3 (mode 0 in the new numbering), but if you then changed into mode 2 mode 3 would no longer operate correctly and you'd get blank rows at the bottom of the screen.

Additionally the display corruption that had been noted that appears when changing into a double-buffered screen mode, such as mode 129, is there because the initial visible buffer has not been cleared. The fix to that is to swap buffers and perform a CLS.

The full fix to the issues requires a different approach to be taken to managing the VGA controllers so that they don't stick around and occupy memory when they're not needed.

I have a fix PR on the Agon Console8 VDP codebase that addresses the memory usage issues that have been discussed above. (The fix from #80 has already been merged on the Console8.) Unfortunately the Console8 VDP codebase has now significantly diverged as it has been refactored quite significantly, so it's not a trivial matter to back-port that fix, and I don't currently have the time to do that.

Having said that, it shouldn't be too hard for someone to replicate the fix and apply it to this repository. The PR is relatively small, and most of the functions are either still identically named or retain very similar names.

stevesims commented 1 year ago

I believe this issue should now be resolved @breakintoprogram and @julianregel now that #88 has been merged.