ZipCPU / arrowzip

A ZipCPU based demonstration of the MAX1000 FPGA board
21 stars 5 forks source link

Where to continue? #3

Closed NeuerUser closed 5 years ago

NeuerUser commented 5 years ago

Hi Dan

So, I have the design running on the MAX1000 now. I am, however, not sure if it works correctly, as I don't know the whole thing at all. Here are my observations so far:

Supervisor Registers sR0 : 0x00000000 sR1 : 0x00000000 sR2 : 0x00000000 sR3 : 0x00000000 sR4 : 0x00000000 sR5 : 0x00000000 sR6 : 0x00000000 sR7 : 0x00000000 sR8 : 0x00000000 sR9 : 0x00000000 sR10: 0x00000000 sR11: 0x00000000 sR12: 0x00000000 sSP : 0x00000000 sCC : Z sPC : 0x0060000c User Registers uR0 : 0x00000000 uR1 : 0x00000000 uR2 : 0x00000000 uR3 : 0x00000000 uR4 : 0x00000000 uR5 : 0x00000000 uR6 : 0x00000000 uR7 : 0x00000000 uR8 : 0x00000000 uR9 : 0x00000000 uR10: 0x00000000 uR11: 0x00000000 uR12: 0x00000000 uSP : 0x00000000 uCC : uPC : 0x00000000

                                                 >00000000 (Bus Err)

0x00600014 0x00000000 SUB $0,R0 00000004 (Bus Err) 0x00600010 0x00000000 SUB $0,R0 00000008 (Bus Err)

0x0060000c 0x00000000 SUB $0,R0 0000000c (Bus Err) 0x00600008 0x00000000 SUB $0,R0 00000010 (Bus Err)

Any idea, how best to go forward? I still don't know this well enough...

ZipCPU commented 5 years ago

Looks like your software components are out of synch with the rest of the design. Try running "make" from the root directory again. That should fix both the TIMER and the VERSION issue. Once the VERSION issue is fixed, zipload should work again.

As for the timer, try writing to the "BUSTIMER" first. This is a basic count-down timer that will accept any value. If you wait too long before querying it, it will set an interrupt and then read zero again. You can check for the interrupt by reading from the programmable interrupt controller, "PIC". Check out the ZipCPU spec for the documentation on how both the BUSTIMER and the PIC are supposed to work.

The "TIMER" is a BCD based seconds timer. Hence, if you write 0x0130 to it, it should count down a minute and a half before setting the RTC interrupt in the PIC. This is also a good test of whether the design is up and running, although I'm not sure I'd ever try writing 0x12345678 to it ;)

NeuerUser commented 5 years ago

Good, a new make helped, indeed. Still not all working, but definitely looking better!!!

NeuerUser commented 5 years ago

btw, what also works very nicely is the SPIO register :) Can control the LEDs nicely.

Major problem seems to be the memory. That is probably the RAM, I guess.

ZipCPU commented 5 years ago

I'm also expecting problems with the flash controller. I've got a brand new flash controller, and ... it's still going through some growing pains.

NeuerUser commented 5 years ago

I tried the dumpflash program. It reported:

$ ./dumpflash 
Before starting, nread = 0
VERSION: 20190305

READ-COMPLETE
The read was accomplished in 19 bytes over the UART

That's a bit short (19 bytes), but the file is actually 8MB long. It only contains 0x00s, but that could be true, as it is probably not initalized.

ZipCPU commented 5 years ago

What to do next:do a git pull, and rebuild the project. Load it onto the device and run netuart, ,giving it the name of the /dev/ttyUSBx file associated with the device. While netuart is running, run zipload passing it as arguments "-r" and the location of your built cputest file. (Should be in the sw/board directory).

cd sw/host; ./zipload -r ../board/cputest

While its loading, fire up a telnet to the console port,

telnet localhost 6956

Once the design loads, you should see a CPU test run across the screen.

It isn't perfect. I'm getting a bit of clunkiness when erasing the flash, but if zipload fails you should then be able to rerun it a second time and it will work.

Alternatively, you can run this design in simulation by running,

cd sim/verilated; ./main_tb ../../sw/board/cputest

Other than taking longer, both should do the same thing. Further, if you give "main_tb" a -d argument as well as a filename to run, it will generate a trace of the entire encounter.

Dan

NeuerUser commented 5 years ago

Wow! You are a genius! It's working!!!

Absolutely great! Now I have something to play with and to try to understand how the Soc works in detail. (I'll probably start with the memory structure, e.g. RAM, FLASH, SDRAM(?), etc, and how the CPU is loaded and started.)

Btw, the cputest takes about 2 seconds. It's pretty fast. The loading before takes rather long. It seems that zipload first reads the flash to see if it has already the correct content (taking about a minute), and if not, flashes the right content (taking about 10 min). Would it also be possible to load the program only to RAM? Or does the CPU do XIP on the flash? (so much to learn...)

Anyway, there are still some files missing for someone to simply checkout the repo and compile it in Quartus. I made a small patch. If you want you can include it. It adds the missing files and removes unneeded files. Ah, and it also changes the hostname for the connection to "localhost", as I guess that most people will not run it on a PC with name "jericho", but probably locally.

Thanks again for this great project!

quartus-missing-files.patch.txt

ZipCPU commented 5 years ago

The CPU is currently running in XIP mode from the flash and fetching its instructions from there on an as needed basis. That means that the instructions themselves are kept on the flash, and read from the flash just before execution. You can read about the fetch routine currently in use here.

There wasn't room to on the device to use the DMA to load the instructions into memory, or to fully pipeline the CPU and use the I-cache. However, if you adjust the linker script, you can adjust how much code is placed into RAM in the first place. Further, if you swap to the RAM only linker script (bkram.ld), you should be able to run without the flash at all.

While getting the flash controller up and running, I had an 8kB buffer in use for my logic analyzer as applied to the flash. This 8kB buffer was so large, the CPU block RAM usage was dropped down to 1kB--lower than it needed to be. In order to get the CPU test to run with this bare minimum of RAM, I had to put all the code on the flash in order to get it to fit. Only the memory data structures are copied into RAM now as a result, none of the code. Now that the flash is working, the block RAM memory has been expanded back to 32kB. To run from memory instead of loading the flash, just use the bkram.ld linker script generated by AutoFPGA.

As for the CPU test being fast .... the multiply test feels painfully slow when running from simulation. ;)

Yes, loading is painfully slow. Some of that is due to the flash being so slow in general, some of it is due to the light-weight debugging bus used by this project. (Yes, I am actually sending 88-bits across the interface just to write an 8-bit value to the flash--very inefficient) Normally I'm using an interface with a bit more capability, but that one wouldn't fit on this chip. Perhaps the way to fix that would be to write a small ZipCPU program into block RAM, then to write a piece of compressed memory to block RAM next to the CPU program. The ZipCPU could then decompress the memory segment, read from the flash, determine if the flash needs to be erased, erase the flash, program it, and then verify it. I've wanted to build a program to do this for some time but ... haven't yet had the opportunity.

BTW, thanks for the hostname catch! I run from "jericho" here (my own CPU) so that I can interact with the device from anywhere on my local network. The ports.h file is supposed to reference localhost, which would work on all networks, but ... I must've missed that. Thank you!

My next steps with this project now include: 1) adding and testing the SDRAM controller, and 2) figuring out the MEMs device and integrating that into the project!

NeuerUser commented 5 years ago

Just wondering as you are mentioning the restricted size of the chip several times: The current design uses currently ~5050 LEs (63%). Is the DMA or the faster wb_bridge so big that they wouldn't fit?

Btw. there are two sister boards for the MAX1000: 1.) a bigger MAX1000 with a 16k LE FPGA and 32MB flash and 2.) the Cyc1000, which uses a different FPGA with 25k LEs. I guess, it should be pretty simple to port to these sister boards.

ZipCPU commented 5 years ago

Now that the SDRAM is a working part of the design, it now uses about 5800/8000 LEs. I'm not going to place the ZipSystem, containing the DMA, onto the device so that I can leave logic room available for whatever a user might wish to do with it. Judging from my ORCONF 2017 presentation, I would expect 1k-2k more elements to fully pipeline the CPU, add in the performance counters, DMA, etc.

I still want to install the MEMS sensor and play with that--that one is still on my list of TODO's.