boochow / micropython-raspberrypi

bare metal Raspberry Pi Zero / Zero W port of MicroPython
MIT License
215 stars 25 forks source link

VC4 baremetal? #23

Open fanoush opened 5 years ago

fanoush commented 5 years ago

Hi boochow, this is great, thank you.

Did you think about going even lower and making it VC4 baremetal instead of ARM baremetal? There is gcc compiler for VC4 VPU and accessing peripherials should be very similar. There is example VPU firmware https://github.com/christinaa/rpi-open-firmware with serial console working that enables SDRAM, turns on ARM core and tries to boot linux there.

Then it would not need extra bootcode.bin and other closed stuff and would boot faster and could even draw less power if you don't enable ARM core at all. Maybe it could even fit into 128KB L2 SRAM where bootcode.bin is initially loaded so even 512MB SDRAM could be off for some use cases (or maybe not, 128KB may be too small for micropython, OTOH microbit has 128+16KB). Disadvantage of bare metal VC4 is not having easy HDMI out and all those VC4 mailbox api calls but the ARM core is free to do other stuff.

boochow commented 5 years ago

@fanoush, thank you for your information. I am interested in programming VC4 and a plan in my to-do list is to make a module which utilizes VC4 as a GPGPU and enable running machine-learning applications on MicroPython (on ARM core).

I had not been considering about MicroPython on VC4 but I think it is an exciting idea because it can realize an interactive GPU programming environment.

Currently I'm not yet familiar with the functions of VC4 but I believe providing some basic functions as modules is important for making an useful port of MicroPython. Do you have any ideas about modules/functions to be embedded in MicroPython on VC4?

fanoush commented 5 years ago

because it can realize an interactive GPU programming environment

By this you probably mean QPU cores which are part of VC4 and handle the 3d acceleration and is documented in VideoCoreIV-AG100-R.pdf linked here https://en.wikipedia.org/wiki/VideoCore#Linux_support

What I mean for micropython is VPU which is the main general purpose CPU of VC4. ARM core is just additional (and optional) core added to BCM2835 but the main CPU of Videocore4 is the VPU - it has general purpose registers and can run code just like ARM core and is the one that starts at poweron and loads bootcode.bin and runs it. ARM is just coprocessor that get powered on later and runs linux kernel an VPU continues to run its own operating system ThreadX and handles HDMI video, camera, x264 decoding,sound etc

VPU documentation was not released but most stuff was figured out and is documented here https://github.com/hermanhermitage/videocoreiv/wiki/VideoCore-IV-Programmers-Manual and there is gcc compiler for VPU here https://github.com/itszor/vc4-toolchain and example firmware using this compiler is here https://github.com/christinaa/rpi-open-firmware

VPU can access same set of registers (UART, SD card, spi, i2c) as ARM core so when using C compiler it is almost the same as programming for ARM.

Personally I had this compiler working and recompiled rpi-open-firmware and it worked. So it just needs porting something like micropython and writing drivers for hardare just like you are already doing for ARM.

EDIT: see e.g. code here https://github.com/christinaa/rpi-open-firmware/blob/master/romstage.c#L130 this gets compiled to custom bootcode.bin and runs on VPU and has UART working

boochow commented 5 years ago

Thank you, now I completely get what you mean. It’s so much interesting ! I’ll read the documents you mentioned and try it in this weekend.

fanoush commented 5 years ago

Thank you for considering it. You may hit a bit more obstacles than trying with ARM as less people went this way but most pieces should be already there and you are already trying to fill the missing stuff. BTW those documents released by broadcom are now here https://www.broadcom.com/support/download-search/?pg=Legacy+Products&pf=Legacy+Products&pn=BCM21553&pa=All The pdf is about QPUs but the android driver has few gold nuggets as it also has VPU assembly code brcm_usrlib/dag/vmcsx/vcfw/rtos/none/vciv and all register descriptions brcm_usrlib/dag/vmcsx/vcinclude/bcm2708_chip or e.g. brcm_usrlib/dag/vmcsx/vcinclude/hardware_vc4.h

As for rpi-open-firmware you may also look into earlier versions if it looks complicated as later there were some C++ abstractions added (maybe to test C++ compiler runtime support?). Also as a start you may just use it as is to just load your micropython baremetal for ARM a bit quicker but when taking over the hardware why not start from the beginning - the VPU? :-)

And BTW as for VPU it looks like it is actually dual core so you get two ~300Mhz VPUs but this is a bit mystery for me what the second core runs at startup, maybe it is stopped until you start it? Interrupts are set for both (?) https://github.com/christinaa/rpi-open-firmware/blob/master/romstage.c#L136

Feel free to ignore all this if this is distracting you from the goal of making baremetal micropython running easily but it is really exciting as you say so it would be nice if someone would finally tried to use all this vc4 stuff :-) I was already thinking about this - porting Espruino or micropython to VC4 but always got disctracted by something else (last thing being Bluetooth LE and fitness trackers and BLE hardware based on Nordic nrf5x, Dialog 14585, TI CC2541 and Telink TLSR8266). Also recently I found snek https://github.com/keith-packard/snek which is even smaller so it may fullly fit into bootcode.bin/VC4 L2 cache better but it is not mature yet and now only has float datatype for basic values so for VC4 with SDRAM full micropython is better.

fanoush commented 5 years ago

Also I am not sure what is your development environment but I did all this with Pi3 and Pi Zero attached over serial. Copying each bootcode.bin to microsd was not fun so I was thinking about some small bootloader that could load the rest of my code over serial (and I did not find the gcc-teststub branch mentioned in vc4-toolchain README which does exactly this, or maybe it was not available at that time?). Nowadays there is rpiboot so one can load bootcode.bin via OTG usb when Zero is powered on without SD card. On Pi with usbbootgui there is now even dialog that pops up and one could perhaps send own bootcode.bin to booting Zero via the "custom application" choice see https://www.raspberrypi.org/blog/gpio-expander/

Anyway if you are using Pi as main computer I have already compiled VC4 compiler for raspbian and can upload it somewhere if you are interested. It took couple of hours to build directly on the pi.

And btw Hermann has nice chronological set of links here https://github.com/hermanhermitage/videocoreiv#videocore-iv-community-and-resources

boochow commented 5 years ago

I built the toolchain and rpi-open-firmware and they work successfully. Thank you very much for your help. I generally use a JTAG interface, openocd, gdb and Ubuntu to load and debug my code on RPi. However they seems to be not yet supported on VC4 according to the VC4 toolchain wiki so I should do debug codes by printf() and UART. Anyway I'm going to try porting MicroPython on VC4! It may be a long way though.

fanoush commented 5 years ago

Wow, great. I hope you can use mostly same code you already have, possibly just with different IO base address. Having same source code to be build for both arm or VC4 just with different makefile flag would be cool (I think rpi-open-firmware has some common headers usable both from arm and vc4 code)

As for VC4 JTAG I only know these are different pins than ARM jtag. On first Pi it had even populated pinheader P2 https://upload.wikimedia.org/wikipedia/commons/9/90/Front_of_Raspberry_Pi.jpg on Zero it is not populated but still available https://www.raspberrypi.org/forums/viewtopic.php?t=209151 The pinout of P2 is described in schematics https://www.raspberrypi.org/app/uploads/2012/04/Raspberry-Pi-Schematics-R1.0.pdf

But anyway, most probably it will not work with openocd out of box. I guess I will at least try it to see what the jtag scan prints out :-)

mfp20 commented 3 years ago

I don't know much of micropython internals but based on a quick search, looks like it is not designed for multiprocessing. So there's no point to move it to VPUs+QPUs. Micropython is good on the single core ARM.

EDIT: it may be useful to use the bootcode.bin and start.elf from https://github.com/librerpi/rpi-open-firmware instead of the official ones. So that they can be further developed at later stage, to take advantage of the other 2 gp cores (VPU) and vector cores (QPU).