ExistOS-Team / ExistOS-For-HP39GII

GNU General Public License v3.0
202 stars 40 forks source link

vram size #81

Open parisseb opened 2 years ago

parisseb commented 2 years ago

I see in the tables/scripts that data starts in RAM at 02000000 and ends at 0202a0d0 B __HEAP_START (that's 168K of data) where the heap starts, with a total vRAM section size of 5M. Since the physical RAM is only 512K, I guess vRAM means virtual RAM and RAM is swapped to flash. If my guess is correct, this would explain why ExistOS is much slower than the HP firmware, for symbolic computations at least. In the HP firmware, we paid a lot of attention to keep all data/heap in the physical RAM. Swapping occurs only for copying code for execution (=read access only). I had to work hard to make giac data usage as small as possible, if my recollection is correct the total amount of data of giac is less than 10K (that's the reason why giac is working on the Numworks with only 256K of RAM or on the Casio fx with less than 512K of physical RAM).

So my question: is it possible to reduce data size and heap size so that everything fits in the physical RAM? I strongly believe it would be worth the effort.

parisseb commented 2 years ago

Well, for KhiCAS it is certainly possible, but not for emu48... The changes I can see currently are 1/ reduce VM_SYS_ROM_SIZE to 0 (I don't see anything related in the loader script), also reduce VM_RAM_SIZE perhaps to 2M, 2/ have a shared buffer for the display, there are currently 4 buffers in the data section. This buffer could perhaps be in the VM_RAM_SIZE_NONE_FTL section (I don't know for the stack). Then we could have 268 pages + the display buffer always not cached. That's 40 pages more than the current config, a 15% increase. What do you think?

Repeerc commented 2 years ago

memory_layout This is the current memory layout of the system. The physical memory is more like a cache for flash memory. I think the content on the screen rarely changes while the calculation is being performed, and the display buffer is automatically swapped out of physical memory which leaves more cache space for computing programs. (The display buffer for system now is located in the heap of virtual memory).

Repeerc commented 2 years ago

m2

Repeerc commented 2 years ago

If the data used in the heap area is small and the program code execution size is also small (no more than 256KB combined), this will result in very little memory swapping.

parisseb commented 2 years ago

Thanks for your picture, very clear. It confirms there is nothing at 0x03000000 for VM_SYS_ROM_SIZE, and setting this constant to 0 will save a little more than 20K, the vmmgr L1PTE and L2PTE arrays being of smaller size. We can save the same amount of RAM by reducing the virtual RAM size, I tried 2M (I don't know how much emu48 really needs, I'm afraid the rom of the 39 must be loaded on the heap, in addition to the ram, that's at least 1.25M). Both combined, I get a saving of 40K. I can also see a gap of 1K left in RAM between page_save_rd_buf and page_save_wr_buf 00019000 l O .bss 00000800 page_save_rd_buf 0001a000 l O .bss 00000800 page_save_wr_buf 0001a800 l O .bss 00000001 vmMgrInit Do you know if there any reason to have an alignment to 4096? Wouldn't 2048 be sufficient? I also wonder if there are some parts of the code of OSLoader that is not used. The text datasize is 0x13f54 (81748 bytes), and this section contains string utilities that may be unused. I also wonder if the ram screen buffer must reallly be 32K to be send to the LCD driver. Since only 2 bits are used per pixel, maybe we could use a RAM screen buffer of 8K and a temporary buffer of 8K and send the screen buffer to the LCD driver in 4 parts, saving 16K of RAM.

By the way, there is also a problem with the current HEAP_END definition, it assumes that the stack has a very small size, something that is definitively not true for CAS computations where recursive calls is the rule. It will not be a problem with a ram size of 5M, because the heap will never enter in collision with the stack for a normal calculator computations, but we must take care of this if the virtual ram size is decreased.

My concern about the screen RAM buffer is that during a CAS computation (like integrate(1/(x^4+1))), a lot of code will be loaded, therefore the RAM buffer will be swapped out, and this requires a write operation on the NAND flash. The same will happen for other read-write areas like stack and heap or giac/khicas data. The problem is that the number of cycles of erase/write of the NAND flash is limited to about 100 000 at best, and even if OSLoader takes care of writing the swapfile at different positions on the NAND, this could impact the lifetime of the calculator. Just think of something interactive like expression editor or the tracemode on a function graph, if each time you press the cursor several erase/write happens on the NAND, you could well have say 500 or 1000 cycles at the end of the day and bad flash sectors at the end of a couple of years.

That's why I suggested to keep the RAM screen buffer unswapped, and why I'm also studying how to reduce the number of writes on the flash because of swapping. I will certainly add a recent improvement I made for the Casio for fixed size allocations in a dedicated area, it might perhaps be allocated at 0x0007a000-0x0007e000, at the end of the physical RAM where OSLoader heap/stack lives, OSLoader does probably not allocate much...

Repeerc commented 2 years ago

The display buffer size of 32KB is because we set the screen to work in 256 levels of grayscale(8 bits per pixel). It may be useful to draw fonts in other languages or to display complex graphics (2D or 3D), which is what we originally thought...

parisseb commented 2 years ago

If the hardware is able to display 256 levels of grayscale, then it's definitively the way to go. I asked because HP always advertised there were only 4 levels of grayscales.

Repeerc commented 2 years ago

On the tiplanet forum you mentioned that calculating the integral "1/(x^4+1)" is two to three times more time consuming than the unpublished firmware, I wonder if it has something to do with CPU frequency.

There is a switch on the Status page to set the CPU to automatically downscale, after turning on the CPU will basically work at 80MHz (and also enter idle mode, complete pause waiting for interruptions), after turning off the CPU will run at 392MHz at full speed, the calculation speed will be different in these two modes.

test1 test2 When Auto slow mode ON, it tooks 2.943s. After closing it, it tooks 0.998s.

Repeerc commented 2 years ago

test3 test4 This screen does work in 256 levels of grayscale, perhaps this mode takes up too much memory, so HP did not choose it...

parisseb commented 2 years ago

That's indeed a possibility. Did you write the virtual memory manager?

Repeerc commented 2 years ago

vmmgr Yes, the virtual memory manager workflow is shown in the figure. We use a queue to manage these page. The source code is located in VmMgr

Actually, 0x03000000 for VM_SYS_ROM_SIZE is used to load some system extensions programs (or user binary programs),but it is still in the testing stage (https://github.com/ExistOS-Team/ExistOS-App-demo).

Repeerc commented 2 years ago

By the way, I have a long puzzling question, why does the stock system firmware seem to get stuck a lot?

parisseb commented 2 years ago

Then I can not assume that SYS_ROM_SIZE can be set to 0... Can we however save the 1K in the gap between page_save_rd_buf and page_save_wr_buf by setting alignment to 2048 instead of 4096? 00019000 l O .bss 00000800 page_save_rd_buf 0001a000 l O .bss 00000800 page_save_wr_buf 0001a800 l O .bss 00000001 vmMgrInit

What about using the physical RAM area 0x7a000 to 0x7e000 for storage?

For the stock firmware, I have no idea. I never looked on things not related to the giac port. HP asked me to rework the giac code in order to be able to use the smallest possible data size (keep everything possible in ROM), and that was a challenge because a lot of C++ structs/classes were allocated dynamically. A lot of defines that you find in config.h were created in order to do that. I was frustated to see this unreachable on the 39gii, but it was later very usefull to port KhICAS to the Casio Prizm, Numworks and later to the monochrom Casio Graph 35eii. And now back to the 39gii!

BTW, a quick fix for displaying menus in KhiCAS: change MINI_OVER to 0 in porting.h.

parisseb commented 2 years ago

When Auto slow mode ON, it tooks 2.943s. After closing it, it tooks 0.998s.

With the recent enhancements I just made in allocations inside KhiCAS (ALLOCSMALL in kgen.cc), I get 1.16s and 0.63s, perhaps a good illustration of write swapping. We could probably get better timings with a little less memory for general page swapping and a reserved area for these small allocations (currently 16K). I'm also thinking of trying a user modifiable mode for 1 bit per pixel display inside KhiCAS (while 8 bits is useful for graphics, you don't really need greyscales for CAS computations, and this would reduce by a factor 8 the potential number of flash write).