enjoy-digital / litex

Build your hardware, easily!
Other
2.82k stars 541 forks source link

Create a custom some with 2 cores and a custom communication #1240

Open jwfaye opened 2 years ago

jwfaye commented 2 years ago

Hello everyone,

I am interested in creating a SoC with 2 cores and a custom communication between them (UART or something else). But reading the BaseSoC implementation, I see no trick that I could do. I thought about creating a SoC module with 2 cores submodules. Can someone help me, every idea is welcome. The idea I have in mind is like in the image attached. Thank you.

Jo Distributed-Memory-Architecture

enjoy-digital commented 2 years ago

Hi @jwfaye,

this architecture could be created but is not currently supported by the SoC integration framework. The best to create a such architecture would probably be to create a SoC based with cpu_type=None, this will provide you a skeleton of SoC with the main bus interconnect/CSR bus that you could extend with your CPUs and peripherals. The main bus of the SoC could probably in a first time be used a the communication network (each CPUs would be a Master on this bus) and could communicate through a mailbox or shared memory.

The CPUs are regular Modules, so you can add them as other Modules with for ex: self.submodules.cpu0 = VexRiscv(platform, variant="standard"). With the CPU added to your design, you'll then have to do the integration manually, similarly to what is done in add_cpu.

Another option could also be to create a LiteX SoC per CPU/Memory and for this you could use a generator for the SoC (similar to generators we can have in LitePCIe/LiteEth/etc... and reintegrate this in a LiteX SoC without CPU.

These are just some initial ideas.

jwfaye commented 2 years ago

Hello, I came back as there is some work in progress. The idea retained is the last one, creating a generator for a CPU and calling it a submodule on a higher level soc. But I have a question. What is the purpose of the do_finalise() method on the generation? Here is some code and the main idea: https://github.com/ridope/multi-riscv-p2p/blob/master/twoCores/base.py. It is not working, so any help is welcome.

enjoy-digital commented 2 years ago

Hello @jwfaye,

in fact for the idea with the generator, the idea is to reuse existing LiteX core (and avoid external duplication) and just have a generator that will procude verilog, that would then be re-imported in a SoC without a CPU. I'll try to prototype this soon. Otherwise, the do_finalize() is used to generate logic once all the element of the SoC have been added/defined, this allow adding logic like Interconnects, CSR bus, etc... For your issue, it seems the CPUBlocks are not added as submodules, this is an issue (but not sure this is the only one).

I'll work on a generator proof of concept in the next days (since also have another need for it) and will try to provide an example.

enjoy-digital commented 2 years ago

Hello @jwfaye,

I've been working on the LiteX standalone SoC generator for which a first version is available at: https://github.com/enjoy-digital/litex/blob/master/litex/tools/litex_soc_gen.py

This should simplify your work and is probably a better approach that should decouple things and avoid code duplication. You can find a very simple AMP SoC with FemtoRV and FireV CPU able to exchange through a SRAM here: https://github.com/enjoy-digital/litex_soc_gen_test/blob/master/digilent_arty.py

And some diagrams/demo shared on Twitter: https://twitter.com/enjoy_digital/status/1524024988982030339.

jwfaye commented 2 years ago

Hello @enjoy-digital, Thank you very much for the support, it helps a lot. I am trying to adapt the code for the d10lite board. I have a minor issue, the litex_soc_gen command is not found. I update the repo to the latest but cannot run the command. I checked for the command litex_term, it runs correctly. Is there something I forgot to do?

jwfaye commented 2 years ago
    # Generate standalone SoC.
    os.system("{} --cpu-type=firev --bus-standard=wishbone --sys-clk-freq=50e6 --name=firev_soc --build".format(os.path.join(soc_gen_root, "litex_soc_gen.py")))

It fix the problem. But I got an error at the synthesis,

Error (12006): Node instance "BUFG" instantiates undefined entity "BUFG". Ensure that required library paths are specified correctly, define the specified entity, or change the instantiation. If this entity represents Intel FPGA or third-party IP, generate the synthesis files for the IP. File: /home/jofaye/Documents/work/computer_architecture/amp/build/amp_d10lite/gateware/amp_d10lite.v Line: 2589 Error (12006): Node instance "FDCE" instantiates undefined entity "FDCE". Ensure that required library paths are specified correctly, define the specified entity, or change the instantiation. If this entity represents Intel FPGA or third-party IP, generate the synthesis files for the IP. File: /home/jofaye/Documents/work/computer_architecture/amp/build/amp_d10lite/gateware/amp_d10lite.v Line: 2788 Error (12006): Node instance "FDCE_1" instantiates undefined entity "FDCE". Ensure that required library paths are specified correctly, define the specified entity, or change the instantiation. If this entity represents Intel FPGA or third-party IP, generate the synthesis files for the IP. File: /home/jofaye/Documents/work/computer_architecture/amp/build/amp_d10lite/gateware/amp_d10lite.v Line: 2796 Error (12006): Node instance "FDCE_2" instantiates undefined entity "FDCE". Ensure that required library paths are specified correctly, define the specified entity, or change the instantiation. If this entity represents Intel FPGA or third-party IP, generate the synthesis files for the IP. File: /home/jofaye/Documents/work/computer_architecture/amp/build/amp_d10lite/gateware/amp_d10lite.v Line: 2804 Error (12006): Node instance "FDCE_3" instantiates undefined entity "FDCE". Ensure that required library paths are specified correctly, define the specified entity, or change the instantiation. If this entity represents Intel FPGA or third-party IP, generate the synthesis files for the IP. File: /home/jofaye/Documents/work/computer_architecture/amp/build/amp_d10lite/gateware/amp_d10lite.v Line: 2812 Error (12006): Node instance "FDCE_4" instantiates undefined entity "FDCE". Ensure that required library paths are specified correctly, define the specified entity, or change the instantiation. If this entity represents Intel FPGA or third-party IP, generate the synthesis files for the IP. File: /home/jofaye/Documents/work/computer_architecture/amp/build/amp_d10lite/gateware/amp_d10lite.v Line: 2820 Error (12006): Node instance "FDCE_5" instantiates undefined entity "FDCE". Ensure that required library paths are specified correctly, define the specified entity, or change the instantiation. If this entity represents Intel FPGA or third-party IP, generate the synthesis files for the IP. File: /home/jofaye/Documents/work/computer_architecture/amp/build/amp_d10lite/gateware/amp_d10lite.v Line: 2828 Error (12006): Node instance "FDCE_6" instantiates undefined entity "FDCE". Ensure that required library paths are specified correctly, define the specified entity, or change the instantiation. If this entity represents Intel FPGA or third-party IP, generate the synthesis files for the IP. File: /home/jofaye/Documents/work/computer_architecture/amp/build/amp_d10lite/gateware/amp_d10lite.v Line: 2836 Error (12006): Node instance "FDCE_7" instantiates undefined entity "FDCE". Ensure that required library paths are specified correctly, define the specified entity, or change the instantiation. If this entity represents Intel FPGA or third-party IP, generate the synthesis files for the IP. File: /home/jofaye/Documents/work/computer_architecture/amp/build/amp_d10lite/gateware/amp_d10lite.v Line: 2844 Error (12006): Node instance "PLLE2_ADV" instantiates undefined entity "PLLE2_ADV". Ensure that required library paths are specified correctly, define the specified entity, or change the instantiation. If this entity represents Intel FPGA or third-party IP, generate the synthesis files for the IP. File: /home/jofaye/Documents/work/computer_architecture/amp/build/amp_d10lite/gateware/amp_d10lite.v Line: 2862 Error: Quartus Prime Analysis & Synthesis was unsuccessful. 10 errors, 287 warnings Error: Peak virtual memory: 1459 megabytes Error: Processing ended: Fri May 13 17:07:37 2022 Error: Elapsed time: 00:00:28 Error: Total CPU time (on all processors): 00:00:39 I will figure it out later.

enjoy-digital commented 2 years ago

@jwfaye: The issue was probably that you need to run --install again to install the console script.

For your error, it seems the code is generated for Xilinx FPGA, I'll have a look soon.

jwfaye commented 2 years ago

the --install allow me to run the command indeed. Thank you.

Which code is generated for Xilinx FPGA? I change the toolchain to Quartus. Here is my code. I do not see what I've done wrong test_d10lite.txt

enjoy-digital commented 2 years ago

@jwfaye: When adapting the design from Digilent Arty, you kept the CRG that uses 7-Series primitives which will not work.

Using this seems to allow the build:

# CRG ----------------------------------------------------------------------------------------------

class _CRG(Module):
    def __init__(self, platform, sys_clk_freq, with_rst=False):
        self.rst = Signal()
        self.clock_domains.cd_sys = ClockDomain()

        # # #

        # Clk / Rst
        clk50 = platform.request("clk50")

        # PLL
        self.submodules.pll = pll = Max10PLL(speedgrade="-7")
        self.comb += pll.reset.eq(self.rst)
        pll.register_clkin(clk50, 50e6)
        pll.create_clkout(self.cd_sys,    sys_clk_freq)
        platform.add_false_path_constraints(self.cd_sys.clk, pll.clkin) # Ignore sys_clk to pll.clkin path created by SoC's rst.
jwfaye commented 2 years ago

Thank you, I didn't see that. The synthesis is complete now without errors. Thank you. I have one last question. For booting Linux on this architecture, the same images as the SMP architecture can be used?

enjoy-digital commented 2 years ago

The LiteX-SoC-Generator is only integrating the base components of the SoC. To boot Linux, you'll still have to add a DRAM in the top-level SoC and make sure map it similarly to what is done in Linux-on-LiteX-VexRiscv project.

Dolu1990 commented 2 years ago

For booting Linux on this architecture, the same images as the SMP architecture can be used?

I think so, as https://github.com/litex-hub/linux-on-litex-vexriscv/blob/master/buildroot/board/litex_vexriscv/linux.config#L11 is turned on.

jwfaye commented 2 years ago

Hello! Thank you @Dolu1990 and @enjoy-digital for the answers.

I got a new one lol. I naively use the demo firmware to load code to my architecture ... The terminal gets stuck as shown in the image. stucked_terminal Is there a specific modification to perform ?

enjoy-digital commented 2 years ago

With the demo I provided, nothing is probably mapped to 0x4000_0000. You can try to add --integrated-main-ram-size=0x10000 to the SoC generation.

jwfaye commented 2 years ago

Hello, Yes indeed, I didn't specify the integration of RAM. I am now trying to build custom firmware. I wondered how I can point to each internal core memory in the linker file? I understand that the two cores have the same memory map. The shared memory can be indexed simply as the memory is not the same. But I would like to place some pieces of code on core1 et some others on core2 and the shared data on shared memory. For now, by modifying the provided linker in the demo firmware, I think I index only one core address. Do you have some ideas?

jwfaye commented 2 years ago

The first idea that I have is to have 2 executables for each core. The question I have with this method is, will the litex_term tool support two terminals opening to load the first code in the FireV then, switch to Femtorv and load the second code in another terminal. Does it sound like a good idea to you?

enjoy-digital commented 2 years ago

Hello @jwfaye,

you can use LiteX's linker file as a base and I would recommend creating two firmware (each with its own specific linker file). For simplicity, in the first time maybe just try to had two separate UART on your hardware. I could then provide more info on how to tunnel the UART over a single UART or Ethernet.

jwfaye commented 2 years ago

Hello @enjoy-digital,

I go for the idea of the two firmware, one for each CPU, with a dedicated linker. So for now things do not work as the synthesis do not complete because I am using 102% of the memory. I design the SoC so that I have 32ko for each memory area (RAM, ROM, shared_RAM), which gives 192ko. If I add the size of the shared ram (4ko) I reach 196ko over the 204 of available internal memory on the de10lite. So I do not understand where I instantiate the supplementary memory. Even if I reduce the memory size of each region it still overlaps by the same 2%. I am fixing this issue by add iteratively the different memory region.

I have a question also about the mmap_m size. The "region.ld" file generated shows that it has a size of 0x1000_0000. Shouldn't it correspond to the size of the shared RAM? Here is the content of the region.ld file : MEMORY { rom : ORIGIN = 0x00000000, LENGTH = 0x00008000 sram : ORIGIN = 0x01000000, LENGTH = 0x00008000 main_ram : ORIGIN = 0x40000000, LENGTH = 0x00008000 mmap_m : ORIGIN = 0xa0000000, LENGTH = 0x10000000 csr : ORIGIN = 0x82000000, LENGTH = 0x00010000 }

jwfaye commented 2 years ago

The first mentioned problem was solved by only specifying the integrated main ram size! So I do not really understand why the errors occurred as the memory sizes provided are bigger than the previous one. Here is the memory "region.ld" file content. MEMORY { rom : ORIGIN = 0x00000000, LENGTH = 0x00020000 sram : ORIGIN = 0x01000000, LENGTH = 0x00002000 main_ram : ORIGIN = 0x40000000, LENGTH = 0x00004000 mmap_m : ORIGIN = 0xa0000000, LENGTH = 0x10000000 csr : ORIGIN = 0x82000000, LENGTH = 0x00010000 }

This is the same for the two standalone SoCs.

jwfaye commented 2 years ago

So I manage to make it work with two separate uart and also with the mux routing to one or other uart

jwfaye commented 2 years ago

I am closing this issue as I think it is solved. I will soon open another one for my other issues lol. Thank you a lot. Joseph

amr-25 commented 1 year ago

Hi, I am working on a similar problem. I need a two-core (no caches) and should be able to write to the other core's(core2) memory remotely from core1 through some connection. Each core will have a code running on separate data but will share data through message(writing to other core's memory) and no Linux. I believe this can be doable from the solution provided above, right? I just need to replace the AMP to have SMP of 2 cores(Vexriscv with no caches CPU)

jwfaye commented 1 year ago

Hello, I manage to make the two cores communicate through a scratchpad memory. Please check this repository : https://github.com/jwfaye/AES_SVM_On_AMP We had two independent applications running on each core and communicating through a scratchpad memory. For the topology description you can check this repository: https://github.com/jwfaye/Asymetric-Multi-Processing. Let me know if you have some questions.

mithro commented 1 year ago

On the topic of AMP,

adz0612 commented 1 year ago

@jwfaye

Hello, I was trying to set the 2 cores up as mentioned in your repository. I tried running the build_platform.pyscript,

I got the following error:

File "./amp.py", line 346, in <module> main() File "./amp.py", line 314, in main soc = BaseSoC( File "./amp.py", line 98, in __init__ uart_mux_pads =[platform.request("serial", 0), platform.request("serial", 1)] File "/home/aditya/Desktop/2CoreInterconnect/litex/litex/build/generic_platform.py", line 345, in request return self.constraint_manager.request(*args, **kwargs) File "/home/aditya/Desktop/2CoreInterconnect/litex/litex/build/generic_platform.py", line 209, in request resource = _lookup(self.available, name, number, loose) File "/home/aditya/Desktop/2CoreInterconnect/litex/litex/build/generic_platform.py", line 99, in _lookup raise ConstraintError("Resource not found: {}:{}".format(name, number)) litex.build.generic_platform.ConstraintError: Resource not found: serial:1

any idea on how to tackle this? Thanks!

jwfaye commented 1 year ago

Hello adz0612,

It is because I was using two serial port on the terasic de10lite board from altera. I modified the associated platform.py to add another serial port (Serial 1) using the gpio pins. I think you'll have to do the same if you are using the same board. If you are not using the same board, you'll have to use the serial port on your board if there is only one you ll need to add another serial. Or use the mux by setting mux argument while building the platform.

adz0612 commented 1 year ago

@jwfaye Hey I'm not using a board I was trying to simulate. So, can this be simulated using verilator?

jwfaye commented 1 year ago

Hello, sorry @adz0612. I wrote the code to target a board. I am not yet an expert but I would recommend you to modify the script and the code of the SoC in Assymetric Multiprocessing to make it work by simulation. I do not know how to do it by now. If anyone can help, it will be useful.

amr-25 commented 1 year ago

Hello @jwfaye,

you can use LiteX's linker file as a base and I would recommend creating two firmware (each with its own specific linker file). For simplicity, in the first time maybe just try to had two separate UART on your hardware. I could then provide more info on how to tunnel the UART over a single UART or Ethernet.

So I manage to make it work with two separate uart and also with the mux routing to one or other uart

So, digilent_arty.py helped me to program the artya7-35 with 2 vexriscvs, however i believe the a7-35 board supports one uart. I see, uart_mux being defined in the script. How to switch between two uarts and upload the applications seperately to each core, or have multiple uarts for multiple cores using single uart. Thanks

jwfaye commented 1 year ago

Hello @amr-25 ,

Sorry for the delay. The mux were implemented for the basic tests to see if I could write and read from the shared memory. To load the code to the cores separately, I used two different uarts, one for each core. I never used the artya7-35 but if you have some GPIO on it, you can implement another uart. To do so, you will need to modify the digilent_arty.py file and add a second serial with pins of the GPIO. Tell me if you managed to do it.