LiteX 2020 roadmap and new LiteX SoC class

enjoy-digital commented 4 years ago

LiteX has evolved a lot and the current SoCCore, SoCSDRAM inheritated from MiSoC have evolved quite a bit in the last years and could be greatly improved now that we have a better idea of the SoCs we want to be able build with LiteX and the features we want to support.

A new refreshed LiteXSoC class could be created and could be flexible enough to: In the short term:

[ ] allow SoCCore/SoCSDRAM/SoCZynq to inherit from it for retro-compatibility with existing designs.
[x] support different types of main SoC bus (Wishbone, AXI, etc...) with different data-widths.
[x] support automatic bridging between bus types and data-width adaptation when adding a bus Master or Slave to the main bus (ex: AXI 64-bit to Wishbone 32-bit, Wishbone 32-bit to AXI 32-bit, etc...)
[x] support AXI-Lite ROM/SRAM/Interconnect when main bus is AXI (requires adding AXI-Lite ROM, SRAM, Interconnect components to LiteX).
[x] support mixed dynamic/static CSRs regions allocation (some CSR regions could be user-defined or reserved by the CPU (and then static), the others could dynamically allocated).
[x] support AXI-Lite to CSRs direct bridging when the main bus is AXI. (we are currently doing AXI->Wishbone->CSR which is not efficient).
[ ] support mixed dynamic/static CSRs allocation within a CSR region. (It's currently difficult to use static CSRs regions since CSRs can move within a module when a new CSR is added).
[x] support mixed dynamic/static IRQs numbering (some IRQs could be user-defined or reserved by the CPU (and then static), the others could dynamically allocated).
[x] support mixed dynamic/static Memory regions allocation (some Memory regions could be user-defined or reserved by the CPU (and then static), the others could dynamically allocated).
[x] support properties on Memory regions: IO/Cached/Linker, allow the CPU (or main Master of the bus) to define the IO/Cached regions and checks regions properties when adding a new peripheral to the memory map.
[ ] ease multiple instances of a similar component (SDRAM, Ethernet, etc...): It's already possible, but requires workarounds.
[x] ease instance of standard SoC components/cores: we currently have redundancies in the targets when adding an Ethernet, SDRAM core and as prototyped in Linux-on-LiteX-Vexriscv (https://github.com/litex-hub/linux-on-litex-vexriscv/blob/master/soc_linux.py) we could probably have add_ethernet, add_sdram, ... methods to easily add standard components. With only a few parameters, the method would be able to find the right core/primitive/implementation to use and then simplify integration and reduce redundancy.
[ ] improve the .csv/.json exports of the SoC informations to allow external tools to easily reuse this.

In the long term:

[x] add logging of all the steps of the SoC elaboration to ease SoC creation/debug. (ex: information on the CPU (type, data width, parameters), main bus (type, data width), about all the automatic operations happening during the SoC creation (bridge additions, data-width adaptation, automatic allocation of CSR regions/IRQ/Memory regions, PLL parameters computations, etc... so that the user can be aware of this more easily: looking at the generated csr.h/soc.v or just receive an error message is often to limited to really understand what is going on).
[ ] add AMP/SMP support for multi-CPU.
[ ] add multi-SoC support with NoC interconnection between SoCs.
etc...

Also, even if using Migen vs nMigen is not really limiting LiteX for now, we should see if it's possible to switch easily between Migen and nMigen compat mode to generate the verilog (would allow testing progressively nMigen while still allowing Migen use) and then see if it's interesting to start creating new cores in nMigen or/and adapt existing ones.

I'll start prototyping this in the next weeks/months and that would be great to discuss or/and have feedback/ideas from the main contributors/users: @mithro, @xobs, @gsomlo, @mateusz-holenko, @jersey99, @bunnie, @gregdavill or anyone having ideas on the direction LiteX should take.

Edit: Feedbacks/Suggestions:

@xobs:

[ ] Using CSRs as a bus by themselves. We had some people wanting to put a 6502 on Fomu, and being able to use CSRs natively would be a neat trick.
[x] BIOS on SPI. Currently this is very clunky, as litex appears to be designed around the assumption that instruction 0 must be in a ROM somewhere in a region called "rom", and must have a program called "bios.elf". This is the case for most large FPGAs, but for smaller ICE40 boards being able to boot directly from XIP SPI would be nice.
[ ] Documented Caching. There appears to be a cache that we're seeing on Betrusted (L2 cache?) that we're not seeing on Fomu. As such, accesses are bursty on Betrusted but super slow on Fomu. It'd be nice to know how that's done, and how to enable it on Fomu.
[ ] Nicer IP Library. I'm still not sure how to handle this one. Over time, we've been building up a library of IP modules that tend to live in various repositories. These are self-contained files, usually Modules, such as a random number generator, touchpad, or RGB LED.

@daveshah1:

[x] it would definitely be good to have the litedram/eth init code in the same repo as the cores by putting them all in one repo or making the BIOS more modular.
[x] Apropos external CSR bus, it would be nice to support systems where the main processor is actually a hard core (Zynq etc) and LiteX just provides peripherals.

mithro commented 4 years ago

If we are going to do some major changes, we should probably think about multiple CPU cores too.

enjoy-digital commented 4 years ago

@mithro: creating this new LiteXSoC class is also a way to refresh the API (we could easily have retro-compatibility in SoCCore/SoCSDRAM that will be based on LiteXSoC but will re-expose the old API). Could you list here the API changes you wanted to do in https://github.com/enjoy-digital/litex/pull/369? I agree for the multi-CPU, this is listed in long-term, but we should make sure the infrastructure will be there for that.

mithro commented 4 years ago

I think we can take a lot of inspiration from the approach that linux-on-litex-vexriscv repository takes;

def SoCLinux(soc_cls, **kwargs):
    class _SoCLinux(soc_cls):
        csr_map = {**soc_cls.csr_map, **{
            "ctrl":       0,
            "uart":       2,
            "timer0":     3,
        }}
        interrupt_map = {**soc_cls.interrupt_map, **{
            "uart":       0,
            "timer0":     1,
        }}
        mem_map = {**soc_cls.mem_map, **{
            "emulator_ram": 0x20000000,
            "ethmac":       0xb0000000,
            "spiflash":     0xd0000000,
            "csr":          0xf0000000,
        }}

        def __init__(self, cpu_variant="linux", **kwargs):
            soc_cls.__init__(self,
                cpu_type       = "vexriscv",
                cpu_variant    = cpu_variant,
                uart_baudrate  = 1e6,
                max_sdram_size = 0x10000000, # Limit mapped SDRAM to 256MB for now
                **kwargs)

            # machine mode emulator ram
            self.submodules.emulator_ram = wishbone.SRAM(0x4000)
            self.register_mem("emulator_ram", self.mem_map["emulator_ram"], self.emulator_ram.bus, 0x4000)

        def add_spi_flash(self):
            # TODO: add spiflash1x support
            spiflash_pads = self.platform.request("spiflash4x")
            self.submodules.spiflash = SpiFlash(
                spiflash_pads,
                dummy=11,
                div=2,
                with_bitbang=True,
                endianness=self.cpu.endianness)
            self.spiflash.add_clk_primitive(self.platform.device)
            self.add_wb_slave(mem_decoder(self.mem_map["spiflash"]), self.spiflash.bus)
            self.add_memory_region("spiflash", self.mem_map["spiflash"], 0x1000000)
            self.add_csr("spiflash")

        def add_leds(self):
            self.submodules.leds = GPIOOut(Cat(platform_request_all(self.platform, "user_led")))
            self.add_csr("leds")

        def add_rgb_led(self):
            rgb_led_pads = self.platform.request("rgb_led", 0)
            for n in "rgb":
                setattr(self.submodules, "rgb_led_{}0".format(n), PWM(getattr(rgb_led_pads, n)))
                self.add_csr("rgb_led_{}0".format(n))

        def add_switches(self):
            self.submodules.switches = GPIOOut(Cat(platform_request_all(self.platform, "user_sw")))
            self.add_csr("switches")

        def add_spi(self, data_width, spi_clk_freq):
            spi_pads = self.platform.request("spi")
            self.add_csr("spi")
            self.submodules.spi = SPIMaster(spi_pads, data_width, self.clk_freq, spi_clk_freq)

        def add_i2c(self):
            self.submodules.i2c0 = I2CMaster(self.platform.request("i2c", 0))
            self.add_csr("i2c0")

        def add_xadc(self):
            self.submodules.xadc = XADC()
            self.add_csr("xadc")

        def add_framebuffer(self, video_settings):
            platform = self.platform
            assert platform.device[:4] == "xc7a"
            dram_port = self.sdram.crossbar.get_port(
                mode="read",
                data_width=32,
                clock_domain="pix",
                reverse=True)
            framebuffer = VideoOut(
                device=platform.device,
                pads=platform.request("hdmi_out"),
                dram_port=dram_port)
            self.submodules.framebuffer = framebuffer
            self.add_csr("framebuffer")

            framebuffer.driver.clocking.cd_pix.clk.attr.add("keep")
            framebuffer.driver.clocking.cd_pix5x.clk.attr.add("keep")
            platform.add_period_constraint(framebuffer.driver.clocking.cd_pix.clk, 1e9/video_settings["pix_clk"])
            platform.add_period_constraint(framebuffer.driver.clocking.cd_pix5x.clk, 1e9/(5*video_settings["pix_clk"]))
            platform.add_false_path_constraints(
                self.crg.cd_sys.clk,
                framebuffer.driver.clocking.cd_pix.clk,
                framebuffer.driver.clocking.cd_pix5x.clk)

            self.add_constant("litevideo_pix_clk", video_settings["pix_clk"])
            self.add_constant("litevideo_h_active", video_settings["h-active"])
            self.add_constant("litevideo_h_blanking", video_settings["h-blanking"])
            self.add_constant("litevideo_h_sync", video_settings["h-sync"])
            self.add_constant("litevideo_h_front_porch", video_settings["h-front-porch"])
            self.add_constant("litevideo_v_active", video_settings["v-active"])
            self.add_constant("litevideo_v_blanking", video_settings["v-blanking"])
            self.add_constant("litevideo_v_sync", video_settings["v-sync"])
            self.add_constant("litevideo_v_front_porch", video_settings["v-front-porch"])

        def add_icap_bitstream(self):
            self.submodules.icap_bit = ICAPBitstream();
            self.add_csr("icap_bit")

        def configure_ethernet(self, local_ip, remote_ip):
            local_ip = local_ip.split(".")
            remote_ip = remote_ip.split(".")

            self.add_constant("LOCALIP1", int(local_ip[0]))
            self.add_constant("LOCALIP2", int(local_ip[1]))
            self.add_constant("LOCALIP3", int(local_ip[2]))
            self.add_constant("LOCALIP4", int(local_ip[3]))

            self.add_constant("REMOTEIP1", int(remote_ip[0]))
            self.add_constant("REMOTEIP2", int(remote_ip[1]))
            self.add_constant("REMOTEIP3", int(remote_ip[2]))
            self.add_constant("REMOTEIP4", int(remote_ip[3]))

        def configure_boot(self):
            if hasattr(self, "spiflash"):
                self.add_constant("FLASH_BOOT_ADDRESS", 0x00400000)

        def generate_dts(self, board_name):
            json = os.path.join("build", board_name, "csr.json")
            dts = os.path.join("build", board_name, "{}.dts".format(board_name))
            subprocess.check_call(
                "./json2dts.py {} > {}".format(json, dts), shell=True)

        def compile_dts(self, board_name):
            dts = os.path.join("build", board_name, "{}.dts".format(board_name))
            dtb = os.path.join("buildroot", "rv32.dtb")
            subprocess.check_call(
                "dtc -O dtb -o {} {}".format(dtb, dts), shell=True)

        def compile_emulator(self, board_name):
            os.environ["BOARD"] = board_name
            subprocess.check_call("cd emulator && make", shell=True)

    return _SoCLinux(**kwargs)

mithro commented 4 years ago

I think there is also some confusion between;

Software running on the device - Zephyr, MicroPython, Linux, etc.
Features required for the software, IE Linux requires the emulator.bin and Linux capable CPU. Zephyr requires ecall, etc.
Peripherals available on a board.
Storage on the board -- IE how the spiflash is laid out...

I wonder if there should be something in-between platforms and targets or maybe platforms should be more advance? The platforms should probably more aggressively inherit from things like XilinxSeries7?

mithro commented 4 years ago

Some random prototype code....

class XilinxSeries7Platform:
  SPI_DUMMY_BITS = None
  SPI_FLASH_SIZE = None

  def add_spi_flash(self, soc, origin, pads, with_bitbang=True):
    soc.submodules.spiflash = SpiFlash(
        spiflash_pads,
        dummy=self.SPI_DUMMY_BITS,
        div=2,
        with_bitbang=with_bitbang,
        endianness=soc.cpu.endianness)
    soc.spiflash.add_clk_primitive(self.device)
    soc.add_wb_slave(mem_decoder(origin), self.spiflash.bus)
    soc.add_memory_region("spiflash", origin, self.spiflash.size)
    soc.add_csr("spiflash")

  def add_ethernet_phy_mii(self, soc):
    ...

class Arty(XilinxSeries7Platform):
  SPI_DUMMY_BITS = 11
  SPI_DUMMY_BITS = 128*MiB

  def add_spi_flash(self, soc, origin, with_bitbang=True):
    spiflash_pads = self.request("spiflash4x")
    XilinxSeries7Platform.add_spi_flash(self, soc, origin, spiflash_pads, with_bitbang)

  def add_ethernet_phy(self, soc, origin):
    self.add_mii(self, soc)
    ...

class LinuxSoC:
  def __init__(self, platform):
    self.add_cpu(variant="linux")

    self.add_csr("ctrl", csr_id=0)
    self.add_csr("uart", csr_id=2)
    self.add_csr("timer0", csr_id=3)

    self.add_memory(type="sram", size=0x8000)
    self.add_memory(type="sram", origin=0x4000000, contents="emulator.bin") # Size is automatically determined to be big enough to fit emulator.bin

    sdram_module = platform.add_sdram(self)
    sdram_origin, sdram_size = self.add_memory(type=sdram_module.bus, size=max(256*MiB, sdram_module.size))
    dma_region = 0x10000
    self.add_linker_region("main_ram", origin=sdram_origin, size=sdram_size-dma_region)
    self.add_linker_region("dma_ram", origin=sdram_origin-dma_region, size=dma_region)

    flash_module = platform.add_spiflash(self)
    if flash_module:
      flash_module.add_region("gateware", size=platform.bitstream_size)
      flash_module.add_region("rom", size=BIOS_MAX_SIZE, contents=self.get_bios)
      flash_module.add_region("dts", size=0x100, contents=self.generate_dts)
      flash_module.add_region("kernel", size=KERNEL_MAX_SIZE, contents="vmlinuz")
      flash_module.add_region("rootfs.cpio", contents="rootfs.cpio")
    else:
      self.add_memory("rom", contents=self.get_bios) # Size is automatically determined

    platform.add_ethernet(self)
    platform.add_video(self)

enjoy-digital commented 4 years ago

Thanks @mithro, that's very similar to what i want to explore. I also think the platforms should provide more information to ease peripherals integration, but not sure the platform should do the integration itself. In the first time, i'll focus on the base infrastructure : Bus/CSR/IRQ/Mem/CPU and then we'll see what is the best to do to ease integration of the standard peripherals.

mithro commented 4 years ago

I would still like to support specifying which CPU architecture independent of the platform or software used. IE Instead of add_cpu('vexriscv', variant='linux') it should be something like add_cpu(variant='linux') and then the CPU architecture is provided in some other way...

mithro commented 4 years ago

I was pondering something like the following;

class SoCBuilder:
  def add_csr(self, name, csr_id=None):
    # Support "hard cpu proxy" as CPU type?
    ...

  def add_irq(self, name, irq_id=None):
    ...

  def add_memory_region(self, name, irq_id=None):
    ...

  def add_cpu(self, arch=None, variant=None, parameters={}):
    ...

  def add_bus(self, name='default', bus_type='wishbone'):
    ...

  def add_peripheral(self, name, obj, bus_name='default', csr_id=None, irq_id=None, mem_base=None):
    setattr(self.soc.submodule, name, obj)
    obj = getattr(self.soc, name)
    if isinstance(obj, AutoCSR):
      self.soc.add_csr(name)
    if isinstance(obj, AutoIRQ):
      self.soc.add_irq(name)
    # If this peripheral has a memory region, slave it to the correct bus
    if isinstance(obj, AutoMemory):
      self.soc.add_bus_slave(obj.bus)
      # Check that the memory_region is found on the given bus location...
      self.soc.add_memory_region(name, origin=mem_base, size=obj.bus.mem_size)

xobs commented 4 years ago

Some wild things I'd like to see:

Using CSRs as a bus by themselves. We had some people wanting to put a 6502 on Fomu, and being able to use CSRs natively would be a neat trick.
BIOS on SPI. Currently this is very clunky, as litex appears to be designed around the assumption that instruction 0 must be in a ROM somewhere in a region called "rom", and must have a program called "bios.elf". This is the case for most large FPGAs, but for smaller ICE40 boards being able to boot directly from XIP SPI would be nice.
Documented Caching. There appears to be a cache that we're seeing on Betrusted (L2 cache?) that we're not seeing on Fomu. As such, accesses are bursty on Betrusted but super slow on Fomu. It'd be nice to know how that's done, and how to enable it on Fomu.
Nicer IP Library. I'm still not sure how to handle this one. Over time, we've been building up a library of IP modules that tend to live in various repositories. These are self-contained files, usually Modules, such as a random number generator, touchpad, or RGB LED.

mithro commented 4 years ago

I was pondering if we should separate the builder infrastructure from the cores?

Maybe litex just provides things like the SocBuilder and we have other modules like; litex-core-vexriscv, litex-core-mor1kx, litex-peripheral-rgb, etc...

These can then be distributed in Python modules and installed via pip (pip also supports installing directly from a github repository and even locking to a specific git revision or git branch...).

If we then use litex-hub as a common location for anything we consider an "okay quality" core?

mithro commented 4 years ago

https://packaging.python.org/guides/packaging-namespace-packages/

enjoy-digital commented 4 years ago

@mithro: i think we are in phase for the SoC builder, here is what i started prototyping:

#!/usr/bin/env python3

import logging

from migen import *

# LiteXSoC -----------------------------------------------------------------------------------------

class LiteXSoC(Module):
    supported_bus_type          = ["wishbone", "axi"]
    supported_bus_data_width    = [32, 64]
    supported_bus_address_width = [32]
    supported_csr_data_width    = [8, 16, 32]
    supported_csr_address_width = [14, 15]
    supported_csr_alignments    = [32, 64]
    def __init__(self, platform, clk_freq,
        bus_type          = "wishbone",
        bus_data_width    = 32,
        bus_address_width = 32,
        csr_data_width    = 8,
        csr_address_width = 14,
        csr_alignment     = 32):

        # SoC creation -----------------------------------------------------------------------------
        logging.info("Creating LiteX SoC for {} FPGA Device.".format(platform.device))
        self.platform = platform
        self.clk_freq = clk_freq

        # Bus check / creation ---------------------------------------------------------------------
        if bus_type not in self.supported_bus_type:
            logging.error("Unsupported bus type: {} supporteds: {:s}".format(
                bus_type, ", ".join(self.supported_bus_type)))
            raise Exception # FIXME
        if bus_data_width not in self.supported_bus_data_width:
            logging.error("Unsupported bus data_width: {} supporteds: {:s}".format(
                bus_data_width, ", ".join(str(x) for x in self.supported_bus_data_width)))
            raise Exception # FIXME
        logging.info("Using {:d}-bit {} bus, {}GiB addressable.".format(
            bus_data_width, bus_type, 2**bus_address_width/2**30))

        self.bus_type          = bus_type
        self.bus_address_width = bus_address_width
        self.bus_masters       = []
        self.bus_slaves        = []
        self.bus_regions       = {}

        # CSR bus check / creation -----------------------------------------------------------------
        if csr_data_width not in self.supported_csr_data_width:
            logging.error("Unsupported CSR bus data_width: {} supporteds: {:s}".format(
                bus_data_width, ", ".join(str(x) for x in self.supported_csr_data_width)))
            raise Exception # FIXME
        if csr_address_width not in self.supported_csr_address_width:
            logging.error("Unsupported CSR bus address_width: {} supporteds: {:s}".format(
                bus_address_width, ", ".join(str(x) for x in self.supported_csr_address_width)))
            raise Exception # FIXME
        logging.info("Using {:d}-bit CSR bus, {}KiB addressable.".format(
            csr_data_width, 2**csr_address_width//2**10))

        self.csr_data_width    = csr_data_width
        self.csr_address_width = csr_data_width
        self.csr_masters       = []
        self.csr_regions       = {}

        # IRQ check / creation ---------------------------------------------------------------------
        self.irqs = {}

    # SoC base elements ----------------------------------------------------------------------------

    def add_bus_master(self, bus):
        pass

    def add_bus_slave(self, bus, size, origin):
        pass

    def add_csr(self, name, csr_id=None):
        pass

    def add_irq(self, name, irq_id=None):
        pass

    def add_mem(self, name, size, origin, type):
        pass

    # SoC core peripherals -------------------------------------------------------------------------

    def add_ram(self, name, size, origin):
        pass

    def add_rom(self, name, size, origin, contents):
        pass

    def add_cpu(self, cpu_type, cpu_variant):
        pass

    # SoC standard peripherals ---------------------------------------------------------------------

    def add_uart(self, name, phy, pads):
        pass

    def add_sdram(self, pads):
        pass

    def add_ethernet(self, name, pads):
        pass

    def add_spi_flash(self, name, pads):
        pass

# SoC instance -------------------------------------------------------------------------------------

from litex.boards.platforms import arty

logging.basicConfig(level=logging.INFO)

soc = LiteXSoC(arty.Platform(), int(100e6), bus_type="wishbone", bus_data_width=32)

Thanks @xobs for the feedback.

enjoy-digital commented 4 years ago

@mithro: i think it still makes sense to have the CPU integration and standard cores in LiteX and specific or complex cores as external modules. Some users were already wondering why LiteX + LiteDRAM + LiteEth, etc... were not in a single repository so maybe it's not a good idea to fragment things too much. The current compromise does not seem to bad on that point.

daveshah1 commented 4 years ago

On that note, it would definitely be good to have the litedram/eth init code in the same repo as the cores - either by putting them all in one repo or making the BIOS more modular.

Apropos external CSR bus, it would be nice to support systems where the main processor is actually a hard core (Zynq etc) and LiteX just provides peripherals.

enjoy-digital commented 4 years ago

@daveshah1: thanks for the feedback. It makes sense and has indeed been reported a few times.

For the case when the processor is external, this is already possible for Zynq (SoCZynq where the SoC CPU is the PS7) or over PCIe (in most of LitePCIe design the SoC CPU is the Host's CPU) but it could be easier with better integration/explanation. I'll make sure those cases are also covered.

mithro commented 4 years ago

I was pondering the "types" of things that exist in a SoC. They are;

CPU
Bus
Peripheral
Memory
Caches (?)

Buses have;

one or more masters
one or more slaves

A bus can be a slave to another bus using an adapter.

Both CPUs and Peripherals can be a bus masters.

A peripheral is connected to a bus and can provide;

control status registers
memory region
irq?

A memory is really just a dumb peripheral which doesn't have any CSRs or IRQs.

enjoy-digital commented 4 years ago

For now i think we can limit the SoC to a Module that:

has a main Bus.
has a CSR bus and collection mechanism.
has an IRQ signal and collection mechanism (optional).
has submodules (with internal CSRs, IRQs).

The main Bus has initially no restriction (except its address space) and no region mapped. The user can optionally provide a initial bus_map that will fix some regions for specific submodules. Bus masters will then optionally provide restrictions to the Bus (Cached/IO region, etc...) and Bus slaves will allocate a region of the Bus that satisfies the requirements (automatic/user-provided origin, size, type, etc...).

The CSR bus is a slave of the main Bus and automatic collection is done on submodules. The user can optionally provide an initial csr_map that will fix CSR numbering of specific submodules.

The IRQ signal is optional, it is just a global signal of the SoC where IRQs of the submodules can be mapped, manually by the user or reserved by submodules. This IRQ signal is connected by the user to the main master of the SoC (generally the CPU(s)). The user can optionally provide an initial irq_map that will fix IRQ numbering of specific submodules.

From that: A CPU is a Module that has one or more Bus masters, and through them introduces restrictions on the Bus mapping (IO/Cached regions). It also can also have CSR (as others submodules) and will generally receive the IRQ signal.

A Peripheral is a submodule with its own CSR/IRQs and connected as a slave to the main Bus.

A Memory is a Peripheral without CSR/IRQs.

A Ether/Uart/SPIBone is a submodule and connected as a master of the main Bus (but not introducing restrictions).

Etc...

So from this simple concept, we should already be able to cover all our actual use-cases. I just need a bit of time to think/prototype/evaluate this.

In the future we could complexify things if needed and consider Peripherals as sub-SoCs with their own Bus and propagate Bus restrictions/allocations through add_bus_master/add_bus_slave methods but i don't think we should consider that for now.

mithro commented 4 years ago

For DMA or debug peripherals (like the wishbone bridge) Peripherals they need to be bus masters right.

I would say that the methods around Ethernet, UART, SPIBone are just "syntactic sugar" for quickly adding a peripheral through the normal Peripheral method?

enjoy-digital commented 4 years ago

Yes exactly, And maybe we could have a SoC class that really handles what i was describing in https://github.com/enjoy-digital/litex/issues/372#issuecomment-581165140 and a LiteXSoC class that inherits from SoC and provides the methods to add the standard components of a LiteX SoC (Ethernet, UART, etc...). In the future, these methods could even be provided directly by the cores, ex: if you use LiteEth in your SoC add_liteeth_mac, add_liteeth_etherbone, etc... would become available.

gsomlo commented 4 years ago

Not much to add in terms of "big picture" ideas, but how about standardizing on 32bit csr-data-width, at least for anything intended to run Linux?

As recent experience indicates, attempting to support variable csr_data_width results in suboptimal (and hard to optimize) accessors. This doesn't even make sense in the LiteX bios proper, which knows the (constant) value of csr_data_width :)

It would simplify things by a lot if we didn't have to deal with variable csr_data_width in any linux drivers either, and could just assume it's a constant. In that case, it'd be nice if it were 32 instead of 8 :)

Just a half baked idea for now, interested in what everyone else thinks about it...

bunnie commented 4 years ago

We recently did a set of side-by-side builds for betrusted-soc (7-series) and betrusted-ec (ICE40) and in both cases going to 32-bit CSRs improved resource utilization and improved critical path; and of course, code execution is faster since now 32-bit accesses are single-cycle.

However, we tend to consolidate all our CSRs into fewer, wider registers using the "Fields" aggregator mechanism. Designers who prefer to allocate single bit-width CSRs for each status or control signal might have a different outcome. That being said, I am also in the 32-bit CSR fan club, but by no means does it need to be "the standard" so long as it's just as easy as passing an argument to change the bit width to the one that best suits our design methodology.

enjoy-digital commented 4 years ago

Thanks @gsomlo, @bunnie for the feedback. For the story, csr_data_width was initially fixed to 8-bit in MiSoC to reduce resource utilization and improve timings and i needed 32-bit CSR for LitePCie based systems where the Host is the main CPU and 8-bit CSR were not efficient (latency over PCIe), so added generic support for csr_data_width to the gateware. But mostly two configurations are used: 8-bit and 32-bit.

As you are saying @gsomlo: Maybe we could restrict csr_data_width usage: 32-bit recommended as default (and mandatory for OSes) and 8-bit supported (but only supported for bare-metal application). I don't think we need to support any other csr_data_width (and i don't think anyone else is using it :)) so the new SoC builder could only accept 8-bit and 32-bit.

bunnie commented 4 years ago

I think if I were to put some pipe dream things out there, the things I'd wish for are:

More tooling targeted toward simulation, and the interchange/modularization of test benches.
It maybe prohibitive to test full chip configurations, so being able to flag variants of a design for targeted simulations without creating a full separate design would be nice. This is a total pipe dream, but just putting it out there.
Better integration for litescope to capture test vectors from live hardware that can be played back later on for regression and CI integration. Could we even do a loop of wishbone-util to e.g. verilator? Making good test vectors is hard.
Reproducible builds / better support or repo organization for version pinning. I now have projects in "support" mode using Litex, complete with users who want exactly one bug fixed, without affecting the other 'bugs' they've come to depend on. I'm not sure this is a solvable problem given the way Python and Github work, but for example, even now I can't figure out why I can't get a "clean" litex tree after pulling master, it complains about borked submodules in subdirectories I've never even heard of or touched. It's probably something I did wrong, but maybe just some helper scripts so those of us with not as good git-fu can follow along with the community's best practices.
A generic grammar to annotate timing constraints (especially on I/O), that could be translated cross-platform, and re-used between projects. I know this is not going to happen anytime soon, but small steps toward this might be possible in the short term.
Tunneling wishbone to debug peripherals via a remote host is pretty powerful, but I feel like the methodology is not well documented and has also changed a lot in the past year (or maybe I'm just living under a rock). But maybe pulling more of the tooling required to do this into the core SoC options can help standardize and promote the methodology.

Again, I think most of the items on the list are too ambitious to do in a single revision, but small steps toward this I think could help make Litex more suitable for projects with long term support requirements and/or projects eventually targeting silicon foundries where you just can't get enough simulation before tape-out.

jersey99 commented 4 years ago

Throwing in my 2 cents:

I am 100% on-board with strides towards reproducible builds, and agree with @bunnie. I would go as far as generating bitfiles (Docker/Nix + unit tests + simulation tests + generate bitfiles with vendor tools (license permitting) mounted into the container). It's a step towards the dream of Hardware In Loop testing. I believe this will help catch that regression before it happens, or maybe help in fixing it sooner, as the pipeline is setup to build bitfiles on demand. This should help us move forward without looking back too often.
I wish there was a single Litex repository (at least Litex += litedram + liteeth + litescope if not all). I feel like this would not only help with cleaner installs, but also help making PRs from around the Litex ecosystem a lot smoother. In addition to this we could potentially get rid of some redundant code between repositories. It can be argued that this potentially raises the barrier of entry for a beginner, but I think code organization and documentation should help with that.
Fix * imports. I understand this was a "design" choice by migen, and looks like it maybe creeping into nmigen. But from a Litex point of view I feel like this can be totally avoided. It muddles readability, and also makes linters and me unhappy.
Perhaps push some of the cool Litex features upstream to migen. To be specific, Litex streams are awesome, and I think it maybe worth the effort to integrate them into migen.
I sometimes tend to use SoC instantiations without a CPU and with either (UART/Etherbone) wishbone bridges. It took me a while to wrap my head around how to do this at the beginning. But I bring this up because we are discussing quite a bit about a standardized SoC instantiation, and didn't want this to be left out. :)

enjoy-digital commented 4 years ago

Thanks @jersey99 for the feedback, I understand and agree with most of the points.

Regarding the LiteX/LiteX's Cores ecosystem, i understand your points, but each of these Core is composed of: PHYs, Core, Frontends, Examples, Generator (for people how don't want to use LiteX for the integration), Test suite, Travis-CI etc... and i find that more convenient to manage each of these in separate repositories (and also think that is what made them possible). For these cores, having their own identify is also interesting to promote it to people who don't want to really want to use (n)Migen/LiteX (at least for now :)) but are just interested by the core itself generated as verilog. LiteDRAM/LiteSATA/LitePCIe/LiteEth are for example used by several projects integrated as a pre-generated verilog core.

But i agree that we should avoid as much as possible code redundancy between cores (there is still be some redundancy, but no that much i think) and probably create new abstractions in LiteX to simplify the cores (as we discussed together IOs abstraction would be useful for LiteEth/LiteDRAM).

mithro commented 4 years ago

I do think we should really invest in figuring out how to move SoCSDRAM and DDR init code out of LiteX and into litedram. Same with the Ethernet stuff and liteEth. There is kind of a weird circular dependency at the moment.

mithro commented 4 years ago

On the topics of streams, I believe they actually originated in Migen but where removed upstream? @enjoy-digital Do I recall that history correctly?

mithro commented 4 years ago

@enjoy-digital It's probably a good idea for us to have a BridgeSoC base class in LiteX somewhere too? The bridge stuff is super powerful and under used.

mithro commented 4 years ago

@jersey99 - You might like LiteX-buildenv if you want a single unified repository.

mithro commented 4 years ago

@enjoy-digital - A CSR data width of "compressed" (IE 8bit) and "native" (same width as the host bus) seems like what should be supported?

xobs commented 4 years ago

@mithro - If the goal of CSRs is to ease routing, then I don't see why they should be limited to any particular size. The point of CSRs is that you can define them to be any logical width, and they'll get split up according to the physical CSR width.

For example, if you had an AES key, you could do self.aes_key = CSRStorage(256) and depending on the csr width you could get four 64-bit CSRs, eight 32-bit CSRs, or 32 8-bit CSRs. Or if you had some weird number like 9-bit CSRs, you'd get 29 of those.

enjoy-digital commented 4 years ago

@mithro, @jersey99: for the streams, it originated in Migen with the Dataflow but it was something i always found too complex to use and that was trying to to all at once and has never been really used. For the streams in LiteX, i just based it on the Record, added valid/ready/first/last signals (to support flow-control/packets) and then built a collection of base stream elements i had a need for in designs built with LiteX: FIFOs/Buffer/Pipe/Mux/Demux/Converter/Gearbox/Monitoring/Packetizer/Depacketizer, etc... These bases elements are heavily used by the cores/designs and have been designed as part of LiteX.

enjoy-digital commented 4 years ago

I'm prototyping a new SoC builder in new_soc branch: Now when doing (for example): lxsim --with-sdram --with-ethernet instead of nothing, we get:

INFO:SoC:        __   _ __      _  __  
INFO:SoC:       / /  (_) /____ | |/_/  
INFO:SoC:      / /__/ / __/ -_)>  <    
INFO:SoC:     /____/_/\__/\__/_/|_|  
INFO:SoC:  Build your hardware, easily!
INFO:SoC:--------------------------------------------------------------------------------
INFO:SoC:Creating new SoC... (2020-02-06 19:07:14)
INFO:SoC:--------------------------------------------------------------------------------
INFO:SoCBus:Creating new Bus Handler...
INFO:SoCBus:32-bit wishbone Bus, 4.0GiB Address Space.
INFO:SoCBus:Adding reserved Regions...
INFO:SoCBus:Bus Handler created.
INFO:SoCCSR:Creating new CSR Handler...
INFO:SoCCSR:8-bit CSR Bus, 16.0KiB Address Space, 2048B Paging (Up to 32 Locations).

INFO:SoCCSR:Adding reserved CSRs...
INFO:SoCCSR:sdram CSR added at Location 8.
INFO:SoCCSR:l2_cache CSR added at Location 9.
INFO:SoCCSR:CSR Bus Handler created.
INFO:SoCIRQ:Creating new SoC IRQ Handler...
INFO:SoCIRQ:IRQ Handler (up to 32 Locations).
INFO:SoCIRQ:Adding reserved IRQs...
INFO:SoCIRQ:IRQ Handler created.
INFO:SoC:--------------------------------------------------------------------------------
INFO:SoC:Initial SoC:
INFO:SoC:--------------------------------------------------------------------------------
INFO:SoC:32-bit wishbone Bus, 4.0GiB Address Space.
INFO:SoC:8-bit CSR Bus, 16.0KiB Address Space, 2048B Paging (Up to 32 Locations).
CSR Locations: (2)
- sdram               : 8
- l2_cache            : 9
INFO:SoC:IRQ Handler (up to 32 Locations).
INFO:SoC:--------------------------------------------------------------------------------
INFO:SoCCSR:ctrl CSR allocated at Location 0.
INFO:SoCBus:master0 added as Bus Master.
INFO:SoCBus:master1 added as Bus Master.
INFO:SoCCSR:cpu CSR allocated at Location 1.
INFO:SoCBus:rom Region added Origin: 0x00000000, Size: 0x00008000, Cached: True.
INFO:SoCBus:rom added as Bus Slave.
INFO:SoCBus:sram Region added Origin: 0x01000000, Size: 0x00001000, Cached: True.
INFO:SoCBus:sram added as Bus Slave.
INFO:SoCCSR:identifier_mem CSR allocated at Location 2.
INFO:SoCCSR:timer0 CSR allocated at Location 3.
INFO:SoCIRQ:timer0 IRQ allocated at Location 0.
INFO:SoCBus:csr Region added Origin: 0x82000000, Size: 0x00010000, Cached: True.
INFO:SoCBus:csr added as Bus Slave.
INFO:SoCCSR:uart CSR allocated at Location 4.
INFO:SoCIRQ:uart IRQ allocated at Location 1.
INFO:SoCBus:main_ram Region added Origin: 0x40000000, Size: 0x04000000, Cached: True.
INFO:SoCBus:main_ram added as Bus Slave.
INFO:SoCCSR:ethphy CSR allocated at Location 5.
INFO:SoCBus:ethmac Region added Origin: 0xb0000000, Size: 0x00002000, Cached: False.
INFO:SoCBus:ethmac added as Bus Slave.
INFO:SoCCSR:ethmac CSR allocated at Location 6.
INFO:SoCIRQ:ethmac IRQ allocated at Location 2.
INFO:SoC:--------------------------------------------------------------------------------
INFO:SoC:Finalized SoC:
INFO:SoC:--------------------------------------------------------------------------------
INFO:SoC:32-bit wishbone Bus, 4.0GiB Address Space.
Bus Regions: (5)
rom                 : Origin: 0x00000000, Size: 0x00008000, Cached: True
sram                : Origin: 0x01000000, Size: 0x00001000, Cached: True
csr                 : Origin: 0x82000000, Size: 0x00010000, Cached: True
main_ram            : Origin: 0x40000000, Size: 0x04000000, Cached: True
ethmac              : Origin: 0xb0000000, Size: 0x00002000, Cached: False
Bus Masters: (2)
- master0
- master1
Bus Slaves: (5)
- rom
- sram
- csr
- main_ram
- ethmac
INFO:SoC:8-bit CSR Bus, 16.0KiB Address Space, 2048B Paging (Up to 32 Locations).
CSR Locations: (9)
- sdram               : 8
- l2_cache            : 9
- ctrl                : 0
- cpu                 : 1
- identifier_mem      : 2
- timer0              : 3
- uart                : 4
- ethphy              : 5
- ethmac              : 6
INFO:SoC:IRQ Handler (up to 32 Locations).
IRQs Locations:
- timer0              : 0
- uart                : 1
- ethmac              : 2
INFO:SoC:--------------------------------------------------------------------------------

This should ease understanding how the SoC is elaborated and gives directly the final Memory/CSR/IRQ mapping without having to look at the generated files. @mithro, @gsomlo, @xobs @bunnie any feedback? are some information missing? If you want to test, you could just switch to new_soc branch and try to build one of your design.

jersey99 commented 4 years ago

@enjoy-digital Another CSR related comment (potentially tangential). Would it make sense to compress and store the CSR map inside the bitfile at a known location? Once the format is known, a client could read the map from the location and get all the information. This I have seen helps in avoiding to manually maintain a map file.

gsomlo commented 4 years ago

On Thu, Feb 06, 2020 at 10:42:52AM -0800, Vamsi Vytla wrote:

Would it make sense to compress and store the CSR map inside the bitfile at a known location?

To me, that sounds like something best accomplished by embedding a DTB blob into the bios -- that way it'd be standard and could be consumed by downstream things such as e.g. BBL, the Linux kernel, etc.

@enjoy-digital, this could be another entry for the wishlist :)

jersey99 commented 4 years ago

@gsomlo I would also like to be able to read it in absence of a CPU :)

enjoy-digital commented 4 years ago

@jersey99, @gsomlo: in Linux-on-LiteX-Vexriscv, the CSRs are allocated dynamically during the gateware elaboration and a DTB is passed to the kernel. This is working fine.

mithro commented 4 years ago

@enjoy-digital Can you link to the lxsim.py file?

enjoy-digital commented 4 years ago

@mithro: that's https://github.com/enjoy-digital/litex/blob/master/litex/tools/litex_sim.py (it's also exposed as lxsim: https://github.com/enjoy-digital/litex/blob/master/setup.py#L48

GuzTech commented 4 years ago

If I might give my humble opinion, I would like to ask for supporting nMigen. While I know that it is not "stable" yet, the support for formal verification is what is the most compelling reason for using it to me. I know that miform exists, but I don't know how well it works.

If nMigen would be supported, then I (and I hope others) would be compelled to actually go and check the lite IPs and write formal verification for them. While tests have been written for the lite IPs, and they have been battle-tested, I find that (correct) formal verification is the only tool that can guarantee that everything functions correctly. And this of course goes for every piece of IP, so I'm not trying to pick on your IPs specifically :)

I feel that LiteX + nMigen would be a very good ecosystem in which I have all the benefits of LiteX making my life much easier, and I have a modern language which gives me and others confidence in that I can formally verify my hardware.

It's probably a lot of hard work, but I just wanted to give my opinion. Keep up the great work!

enjoy-digital commented 4 years ago

Thanks @GuzTech for sharing your opinion. Using/supporting nMigen is indeed part of the roadmap, at least in compat mode to first evaluate the work it would be to do the real switch to nMigen:

Also, even if using Migen vs nMigen is not really limiting LiteX for now, we should see if it's possible to switch easily between Migen and nMigen compat mode to generate the verilog (would allow testing progressively nMigen while still allowing Migen use) and then see if it's interesting to start creating new cores in nMigen or/and adapt existing ones.

But even in compat mode, verifying all the actual codebase with nMigen still requires some work and the few people that are working on LiteX have not been able to focus on that yet. We are currently improving tests of the different cores and simplifying things and this work will be useful for nMigen compat tests and porting.

I agree with the fact that having formal verification would be great, but this is something complementary to other testing methods, not something you necessarily want/can apply for everything. It can be useful to test critical modules of a design (FIFO, Interfaces, CPU, etc...) but less useful or not applicable when testing complex cores or full systems. So that's not the only tool that can guarantee that everything functions correctly, it's more an additional tool that can easily cover cases that are difficult to cover with functional simulation, similarly to cases that are covered easily by functional simulation but are difficult to cover with formal methods. You always have to be practical when testing/verifying a design, formal is sometimes the right tool, sometimes not.

For now, a way to use nMigen cores with LiteX is to export the nMigen design to verilog and instantiate it in the system built with LiteX. This is what is currently done with Minerva CPU: https://github.com/enjoy-digital/litex/blob/master/litex/soc/cores/cpu/minerva/core.py#L91

We could expect the situation to improve in the future, but it could take a bit of time.

GuzTech commented 4 years ago

Thank you for the reply. I'm sure it is a lot of work to change LiteX so that it uses nMigen, so I completely understand if it takes time to do it. In the meantime the trick with the Minerva CPU is actually really useful! I want to make clear though, that I didn't mean that you guys must formally verify your IP, but that it would be easier for contributors to write formal proofs for (parts of) your IP if it was written in nMigen :) But I know that a rewrite or checking compat mode is not easy and would take a lot of time.

Theoretically speaking, only FV can prove that a design is correct, because no matter how good your tests are, you are human and you can miss several crucial tests that would make the design fail. And yes, formal proofs can also miss things because it's humans that write them :) But an SMT solver can and will find things we would never think of given that the proofs are correct.

Practically speaking though, I understand what you mean. Writing proofs can be very difficult and SMT solvers taking forever to prove something is not practical, so in practice it is an additional tool to extensive functional tests. Still, when I see what @ZipCPU accomplishes, I see that FV can be hard, but it does not have to be as hard as one might think.

And that's my €0.02 :)

enjoy-digital commented 4 years ago

Indeed we understand each others. I just wanted to emphasis that formal verification is not the solution to everything, it's a nice tool, that can have very interesting additional value when used on cases that are difficult to cover with functional simulation but easier to cover with formal. I clearly see cases in the different cores where it would use useful to exercise a bit of formal verification, but for now we are working on the others priorities :) But yeah, if you want to combine the benefits of nMigen and LiteX, using a verilog instance of a nMigen core could be a good tradeoff.

enjoy-digital commented 3 years ago

We are now in 2021 and can probably close this. Most of the planned work here has been done (thanks for all the contributions) and separate issues have been created to track the improvements listed here that haven't been yet done. I'll also try to create a 2021 roadmap soon.

enjoy-digital / litex

LiteX 2020 roadmap and new LiteX SoC class #372