RFC: Split LiteX CPU cores into their own Python modules

mithro commented 4 years ago

With the increasing number of CPU cores supported by LiteX, cloning the LiteX repository means cloning a large number of submodules which can get very big.

I think we should convert each of the supported LiteX CPUs into their own Python packages.

litex/soc/cores/cpu becomes a "namespace module" which other Python packages can provide modules under. See https://packaging.python.org/guides/packaging-namespace-packages/

The CPUs can be split into;

[ ] litex-cpu-lm32
[ ] litex-cpu-mor1k
[ ] litex-cpu-vexriscv
[ ] litex-cpu-blackparrot
[ ] litex-cpu-microwatt
[ ] litex-cpu-chiselwatt
[ ] litex-cpu-rocket
[ ] litex-cpu-picorv32
[ ] litex-cpu-minerva

This allows projects which are only using a single CPU like VexRISCV to only pay the cost of that CPU.

The setup process then becomes;

git clone git+ssh://github.com/enjoy-digital/litex.git
(cd litex; python setup.py install)
git clone git+ssh://github.com/enjoy-digital/litex-cpu-vexriscv.git
(cd litex-cpu-vexriscv; python setup.py install)

People can also submodule the repository in using the normal method of;

git submodule add third_party/litex git+ssh://github.com/enjoy-digital/litex.git
(cd third_party/litex; python setup.py -e install)
git submodule add third_party/litex-cpu-vexriscv git+ssh://github.com/enjoy-digital/litex-cpu-vexriscv.git
(cd third_party/litex-cpu-vexriscv; python setup.py -e install)

Or specify a Python requirements file using;

-e https://github.com/enjoy-digital/litex.git#egg=litex
-e https://github.com/enjoy-digital/litex-cpu-vexriscv.git#egg=litex-cpu-vexriscv

We can easily set up Travis CI so that any push to the LiteX repository causes each of the CPU packages to be tested in combination with the based LiteX.

As VexRISCV is currently the most popular CPU to use with LiteX, we could also make VexRISCV the "default" CPU that is included with LiteX by default and just move the other CPU cores into their own modules. I personally don't think we should preference VexRISCV like that, but it could be a possible compromise?

Thoughts?

mithro commented 4 years ago

FYI - @xobs, @gsomlo, @enjoy-digital

enjoy-digital commented 4 years ago

@mithro: i was also thinking about that when seeing the number of submodules now included, but was thinking doing something a bit different: keep the LiteX specific integration code for a CPU in LiteX (so mainly the core.py and software) but only get the CPU code when really needed if not already downloaded (a bit similar to how lxbuildenv is handling dependency).

So i agree with the fact that the CPUs should now easily pluggable but maintaining each CPU separately is going to complicate more maintenance. So maybe a good compromise is to keep code in LiteX for the main supported CPUs, add a tool to download the verilog/vhdl/nmigen code at buildtime, and allow use of external CPUs? This way when cloning LiteX, no CPU code would be downloaded.

mithro commented 4 years ago

Please don't develop yet another system for downloading things, there is a lot of complexity hidden there.

The two well tested ways to do this are;

Python PIP
Git Submodules

enjoy-digital commented 4 years ago

Sure, but what about avoiding the initial --recursive clone on LiteX and just add a tool that allow easy initialization of the submodules 1) manually from command line 2) or automatically when required by the SoC if not already initialized? This way we don't change too much things and still rely on Git Submodules. Why not having a more sophisticated thing later when CPUs will really be pluggable, but for now we could probably have something that is easier and that reduces the required maintenance.

ewenmcneill commented 4 years ago

FTR, I've also found sometimes that chasing down all the submodules pins that have changed when updating can be non-trivial (especially if there's a non-trivial change in the upstream CPU verilog git repo structure, eg I had one case a few weeks ago where the previously pinned commit just seemed to be gone). So I'd also be in favour of not fetching all CPU cores by default.

Still using git submodules, but making the "CPU verilog" git submodules be "dynamically added by user" (or user tooling), rather than a baked in part of the litex repo, as suggested by @enjoy-digital in https://github.com/enjoy-digital/litex/issues/394#issuecomment-589259195 seems like a reasonable approach to me.

With something like "litex fetch $CPU_TYPE" as a wrapper for git submodule add ... with known names/URLs at known locations (if required) and then fetching/updating to a specific version (ie, equivalent to the existing "baked in" pins, but in a config file of the litex fetch function rather than in the git repo metadata directly).

Ewen

mithro commented 4 years ago

The problem is there are other submodules which need to be cloned via --recursive.

gsomlo commented 4 years ago

I've silently (until now :) ) been in the "We've already got too many separate python repositories already, and I wish all IP cores -- litesdcard, liteeth, etc. -- were subdirectories of the main LiteX project".

As such, having more of them is a net negative for me. I also dislike pip, because it always ends up trying to fight my distro package manager, and because once adopted, upstream project maintainers end up recommending it as the production installation method for the project, again blowing off any attempt to "be nice" to downstream distro packaging concerns. (For the record, this is not a LiteX gripe in particular, just an explanation of why I instinctively resent pip :) )

I generally have no problem doing a git submodule update --recursive as needed, and in my experience, disk is cheap, so I've not yet felt any sort of discomfort due to the way things are set up right now. I'm also a bit of a "digital hoarder", so "grab everything, sort it out later offline" is my default instinctive behavior...

I can live with demand-loaded submodules if absolutely necessary (and the added complexity is justified because it somehow makes life tolerable for a lot of others). Heck, I can live with anything the rest of the LiteX community decides -- While being "conservative" w.r.t. my tools and work environment, I also try to be low maintenance and fit in with the established crowd's preferences.

Just figured I'd throw in my $0.02 from the peanut gallery, now that we're collecting opinions :)

Cheers, --Gabriel

enjoy-digital commented 4 years ago

Thanks @ewenmcneill, @gsomlo for the feedback. @gsomlo, i agree we should avoid fragmentation when possible, for the cores, that's a compromise i find interesting for the project and useful as a maintainer as i tried to explain in https://github.com/enjoy-digital/litex/issues/372#issuecomment-582284720, but i can understand your point of view.

@mithro: which submodule needs to be cloned with --recursive? Even if needed, the compromise i suggest will still allow this.

mithro commented 4 years ago

Stuff under https://github.com/enjoy-digital/litex/tree/master/litex/soc/software

mithro commented 4 years ago

How about the following; (a) We split apart litex from the CPUs cores. (b) We create a litex-everything repository which submodules all the litex stuff together into one repository. (c) We set up some simple CI on LiteX-everything which checks some basic stuff works together. (d) We set up an auto-updating bot which will automatically update the submodules in the litex-everything repository to the latest version (assuming the CI on the combined versions goes green)?

This way, people who want to just clone a single repo and get everything together are happy and people who want to download parts are also happy?

I also have a git-local-submodules python script (based on the original horrific shell scripts) which even makes it possible for CI to work well with people's personal forks of the submodules (when running on CI it looks for https://github.com/<user>/<submodule> before falling back to the original submodule location.

It also means that we have a good record from the litex-everything repo and it's CI records of exactly what revisions of each of the modules is known to work correctly together.

ewenmcneill commented 4 years ago

How about the following; (a) We split apart litex from the CPUs cores. (b) We create a litex-everything repository which submodules all the litex stuff together into one repository. {...}

@mithro To me that sounds like a good combination: more flexibility in what gets checked out (especially from third party upstreams which might move out of sync with the general litex changes) and a "I want it all" single top level repo for convenience of those that do not want a la carte checkouts.

Ewen

enjoy-digital commented 4 years ago

@mithro: i'm not comfortable with more fragmentation, LiteX is a whole and the current fragmentation has been done to avoid maintaining complex cores in LiteX directly and because there were some specialization involved. That's true that there is still some inter-dependency, but i'm working on that with the new SoC class.

Here is the current description i can do of LiteX:

LiteX is a (n)Migen based Core/SoC builder that provides the infrastructure to easily create Cores/SoCs (with or without CPU). The common components of a SoC are provided directly: Buses and Streams (Wishbone, AXI, Avalon-ST), Interconnect, Common cores (RAM, ROM, Timer, UART, etc...), CPU wrappers/integration, etc... and SoC creation capabilities can be greatly extended with the ecosystem of LiteX cores (DRAM, PCIe, Ethernet, SATA, etc...) than can be integrated/simulated/build easily with LiteX. It also provides build backends for open-source and vendors toolchains.

If we fragment too much, it will just loose its identify. I've worked hard to try to create something i think has a coherence but that still moves/evolves/improves while we are figuring out things. If we go in your direction, i think it's going to become something too difficult to apprehend and i'm not sure i'll be happy to continue maintaining it. So I agree with the main idea, but i think we should solve things differently:

Here the issue is not really the CPU support (which is really not that much code but part of LiteX integration), it's the fact we are recommending using git clone in --recursive mode while we now have too much submodules and most of them won't be used directly by users. I did an attempt at solving this by just improving a bit the litex_setup.py script already used: https://github.com/enjoy-digital/litex/pull/395

With this, we can easily have minimai/customized/full LiteX:

Minimal LiteX install with all the cores, but no CPU support:

$ ./litex_setup.py --init --install

Minimal LiteX install with all the cores, with VexRiscv support:

$ ./litex_setup.py --init --install
$ ./litex_setup.py --submodule-init=litex_compiler_rt
$ ./litex_setup.py --submodule-init=litex_cpu_vexriscv

Full LiteX install with all the cores and support for all CPUs:

$ ./litex_setup.py --init --install --submodule-init=all

With this litex-everything is just the last configuration and the one that would be used by Travis-CI.

In litex_setup.py, we could also make the git clone of the LiteX's cores optional, like what we are doing for the submodules if we really want to have something minimal. We could also create some default configurations (ex: provides compiler_rt/tapcfg/vexriscv submodules, liteeth/litedram/litescope cores by default) and also extends the cores that can be installed manually with a list of compatible cores (ex: add valentyusb).

So this seems both simple and flexible. @mithro @ewenmcneill @gsomlo can you give your feedback?

xobs commented 4 years ago

I think these issues are inherent with git submodules, and I don't know what the solution is. I like the suggestion put forward by @mithro to split things into multiple disparate repositories.

As an example of the problem, circuitpython depends on a lot of submodules. They recommend doing a git clone --recursive to pull them all, but that means that to build a RISC-V port I need to download a bunch of stuff including 82 megabytes of stm32lib and 13 megabytes of NRF stuff.

As you say, this is becoming an issue in litex as well. I've checked, and the three biggest offenders are microwatt, vexriscv, and rocket, which together are using over 200 MB. I'm not using any of these in my project, except core.py and __init.py__ from vexriscv, but because of the way the repositories are organized I have to grab 40+ MB of vexriscv source.

As @mithro also noted, we can't get around this by partially initializing submodules, because we'd need to keep track of everything that we would need to initialize. For example, litex requires you to initialize litex/soc/software/compiler_rt, and sometimes litex/build/sim/core/modules/ethernet/tapcfg. However, litex itself may merely be a module in a larger dependency tree, so that tree needs to know to ignore everything under litex except what litex initializes itself.

mithro commented 4 years ago

@enjoy-digital Moving the CPU submodules to third_party gave me any idea which I think will make everyone happy as it;

[x] Removes the need for submodules or git clone --recursive.
[x] Allows you to modify the whole LiteX ecosystem together and gives a unified "known good working together" revision number which covers the entire ecosystem.
[x] Keeps the large submodules separate from the very small Python code.
[x] Needs no scripting support, including no litex-setup.py script or similar.
[x] Compatible with Conda, Python virtualenv, system packaging and manual set up.
[x] Allows people to use just part of the LiteX ecosystem.

It'll take me an hour to put a quick prototype together. I'll post here when I have something.

enjoy-digital commented 4 years ago

@mithro: interesting!

mithro commented 4 years ago

@enjoy-digital - I now have a prototype. The two repositories are;

I have tried to explain the basic concepts in the Developing section of the README.md file.

A couple of the concepts are;

Use requirements.txt files with directories inside your current repository.
Turn the current heavy submodules into litex-data-XXXX proper Python packages and use Python namespace packages to make the packages provide a litex.data.XXX module which can be used to find the data files.
Use git subtrees to make it very easy to manage both a single monorepo called litex which includes all the litex-ecosystem and individual liteXXX git repositories.
Use pip requirements.txt file for specifying dependencies -- specifying things in the local repository.

In my opinion it comes out very clean and extends really nicely for people to use in individual projects -- see the README.md file at https://github.com/mithro/litex-all.

I haven't finished testing things yet, but I'm pretty sure this will work....

mithro commented 4 years ago

$ python -c 'import litex.data.vexriscv as m; print(m.DATA_LOCATION)'
/home/tansell/github/mithro/temp/litex/env/src/litex-data-vexriscv/litex/data/vexriscv/verilog
$ ls -l /home/tansell/github/mithro/temp/litex/env/src/litex-data-vexriscv/litex/data/vexriscv/verilog
total 2688
-rw-r--r-- 1 tansell primarygroup    326 Feb 21 11:50 build.sbt
drwxr-xr-x 3 tansell primarygroup   4096 Feb 21 11:50 ext
-rw-r--r-- 1 tansell primarygroup   2091 Feb 21 11:50 Makefile
drwxr-xr-x 2 tansell primarygroup   4096 Feb 21 11:50 project
-rw-r--r-- 1 tansell primarygroup   2825 Feb 21 11:50 README.md
drwxr-xr-x 3 tansell primarygroup   4096 Feb 21 11:50 src
-rw-r--r-- 1 tansell primarygroup 241391 Feb 21 11:50 VexRiscv_Debug.v
-rw-r--r-- 1 tansell primarygroup    202 Feb 21 11:50 VexRiscv_Debug.yaml
-rw-r--r-- 1 tansell primarygroup 249669 Feb 21 11:50 VexRiscv_FullDebug.v
-rw-r--r-- 1 tansell primarygroup    202 Feb 21 11:50 VexRiscv_FullDebug.yaml
-rw-r--r-- 1 tansell primarygroup 241217 Feb 21 11:50 VexRiscv_Full.v
-rw-r--r-- 1 tansell primarygroup    143 Feb 21 11:50 VexRiscv_Full.yaml
-rw-r--r-- 1 tansell primarygroup 327733 Feb 21 11:50 VexRiscv_LinuxDebug.v
-rw-r--r-- 1 tansell primarygroup    202 Feb 21 11:50 VexRiscv_LinuxDebug.yaml
-rw-r--r-- 1 tansell primarygroup 309565 Feb 21 11:50 VexRiscv_LinuxNoDspFmax.v
-rw-r--r-- 1 tansell primarygroup 319889 Feb 21 11:50 VexRiscv_Linux.v
-rw-r--r-- 1 tansell primarygroup    143 Feb 21 11:50 VexRiscv_Linux.yaml
-rw-r--r-- 1 tansell primarygroup 208065 Feb 21 11:50 VexRiscv_LiteDebug.v
-rw-r--r-- 1 tansell primarygroup    202 Feb 21 11:50 VexRiscv_LiteDebug.yaml
-rw-r--r-- 1 tansell primarygroup 199778 Feb 21 11:50 VexRiscv_Lite.v
-rw-r--r-- 1 tansell primarygroup    143 Feb 21 11:50 VexRiscv_Lite.yaml
-rw-r--r-- 1 tansell primarygroup 173950 Feb 21 11:50 VexRiscv_MinDebug.v
-rw-r--r-- 1 tansell primarygroup     59 Feb 21 11:50 VexRiscv_MinDebug.yaml
-rw-r--r-- 1 tansell primarygroup 165964 Feb 21 11:50 VexRiscv_Min.v
-rw-r--r-- 1 tansell primarygroup      3 Feb 21 11:50 VexRiscv_Min.yaml
-rw-r--r-- 1 tansell primarygroup 232923 Feb 21 11:50 VexRiscv.v
-rw-r--r-- 1 tansell primarygroup    143 Feb 21 11:50 VexRiscv.yaml

mithro commented 4 years ago

$ pip install -r requirements/vexriscv.txt
Obtaining file:///home/tansell/github/mithro/temp/litex/litex-core (from -r requirements/base.txt (line 2))
Obtaining file:///home/tansell/github/mithro/temp/litex/litedram (from -r requirements/base.txt (line 3))
Obtaining file:///home/tansell/github/mithro/temp/litex/liteeth (from -r requirements/base.txt (line 4))
Obtaining file:///home/tansell/github/mithro/temp/litex/litesata (from -r requirements/base.txt (line 5))
Obtaining litex-data-vexriscv from git+https://github.com/mithro/litex-data-vexriscv.git#egg=litex-data-vexriscv (from -r requirements/vexriscv.txt (line 2))
  Updating ./env/src/litex-data-vexriscv clone
  Running command git fetch -q --tags
  Running command git reset --hard -q f266e40fc8335d565c8d561451c87e3fc1537f40
Obtaining migen from git+https://github.com/m-labs/migen.git#egg=migen (from -r requirements/base.txt (line 1))
  Updating ./env/src/migen clone
  Running command git fetch -q --tags
  Running command git reset --hard -q e2e6c726c9c34209cd326d0a80df63668285a378
Requirement already satisfied: pyserial in ./env/lib/python3.7/site-packages (from litex==0.2.dev0->-r requirements/base.txt (line 2)) (3.4)
Requirement already satisfied: pyyaml in ./env/lib/python3.7/site-packages (from litedram==0.2.dev0->-r requirements/base.txt (line 3)) (5.3)
Requirement already satisfied: colorama in ./env/lib/python3.7/site-packages (from migen->-r requirements/base.txt (line 1)) (0.4.3)
Installing collected packages: litex-data-vexriscv, migen, litex, litedram, liteeth, litesata
  Attempting uninstall: litex-data-vexriscv
    Found existing installation: litex-data-vexriscv 0.0.-
    Uninstalling litex-data-vexriscv-0.0.-:
      Successfully uninstalled litex-data-vexriscv-0.0.-
  Running setup.py develop for litex-data-vexriscv
  Attempting uninstall: migen
    Found existing installation: migen 0.9.2
    Uninstalling migen-0.9.2:
      Successfully uninstalled migen-0.9.2
  Running setup.py develop for migen
  Attempting uninstall: litex
    Found existing installation: litex 0.2.dev0
    Uninstalling litex-0.2.dev0:
      Successfully uninstalled litex-0.2.dev0
  Running setup.py develop for litex
  Attempting uninstall: litedram
    Found existing installation: litedram 0.2.dev0
    Uninstalling litedram-0.2.dev0:
      Successfully uninstalled litedram-0.2.dev0
  Running setup.py develop for litedram
  Attempting uninstall: liteeth
    Found existing installation: liteeth 0.2.dev0
    Uninstalling liteeth-0.2.dev0:
      Successfully uninstalled liteeth-0.2.dev0
  Running setup.py develop for liteeth
  Attempting uninstall: litesata
    Found existing installation: litesata 0.2.dev0
    Uninstalling litesata-0.2.dev0:
      Successfully uninstalled litesata-0.2.dev0
  Running setup.py develop for litesata
Successfully installed litedram liteeth litesata litex litex-data-vexriscv migen

mithro commented 4 years ago

/home/tansell/github/mithro/temp/litex/litedram              is litedram   from https://github.com/enjoy-digital/litedram
/home/tansell/github/mithro/temp/litex/liteeth               is liteeth    from https://github.com/enjoy-digital/liteeth
/home/tansell/github/mithro/temp/litex/litesata              is litesata   from https://github.com/enjoy-digital/litesata
/home/tansell/github/mithro/temp/litex/litex-core            is litex      from https://github.com/enjoy-digital/litex

enjoy-digital commented 4 years ago

@mithro: thanks for the prototype. With this, is it possible to keep the LiteX repository organisation exactly similar to the actual one (ie i'm not sure i want to use a litex-all repository myself, buf if it can be useful for others, i'm ok having one in litex-hub) and just replace the current git submodules with the python package equivalent? If so that could be fine for me.

Also from what i understand all git submodules would be replaced with python packages? So why do we have instruction to update submodules in the README?:

“ Make sure you have all the submodules
git submodule update --init --recursive

mithro commented 4 years ago

@enjoy-digital My plan was that litex would be renamed to litex-core and the litex-all repository would become litex.

I left the two non-CPU related submodules, see below;

 bd557ff00d8fe2473fcf346e36c96d004e94b8ca litex-core/litex/build/sim/core/modules/ethernet/tapcfg (heads/master)
 81fb4f00c2cfe13814765968e09931ffa93b5138 litex-core/litex/soc/software/compiler_rt (81fb4f00c)

litex-core/litex/build/sim/core/modules/ethernet/tapcfg could be converted to a python module without much hassle and I guess that compiler_rt could be converted to litex-data-software-compiler-rt or something.

@enjoy-digital - One thing you might not have realized is that in the litex-all repository you can edit files anywhere inside the repo and commit them together as one patch just like if it was a single repository. There is no managing where each module points too or dealing with submodules being out of sync. This means you can do something like rename a module, update all the users of that module and commit it all together as a single commit! git-subtree push will magically split the combined patch apart when pushing it to each of the individual repositories. This should make refactoring significantly easier!

enjoy-digital commented 4 years ago

@mithro: i'm not that familiar with git subtrees, but for the user, LiteX would then be seen as a single repository, so that's interesting. While keeping the current organization, i would put the LiteX Cores in litex/soc/cores:

litedram
liteeth
litepcie
...
cpu/
bitbang.py
...

The cores are pretty small in footprint (less than 20MB i think for the total), so i don't think it make senses to select which one to have or not, we could just have all LiteX cores as subtrees.

And then manage the current submodules as python packages as you suggest. Not sure we should have a litex-all repository then? Just keep current code organization, add the subtrees for the LIteX cores, add the requirements for the python packages?

enjoy-digital commented 4 years ago

@mithro: yes i realize that subtrees could be very convenient for commits. For litex-core/litex-all, i would just keep litexand as i suggest above put the cores in litex/soc/core where the others cores are located. What do you think?

mithro commented 4 years ago

LiteDRAM / LiteEth have a good enough reputation outside of the LiteX ecosystem I suggest they should stay as their own modules. Including then directly into LiteX might stop people thinking they can be used outside of that ecosystem?

enjoy-digital commented 4 years ago

I think that's fine if: It's still maintained in as a separated repository (but included in LiteX as subtrees for users) and if we are able to keep the import: from litedram import xxyy instead offrom litex.soc.litedram import xxyy.

mithro commented 4 years ago

My plan was that your current LiteX repo becomes litex-core and then the litex repo is the one with all the litex related subtrees contained in it.

mithro commented 4 years ago

FYI - With this subtree system you can't really have the top level be a python module with setup.py

mithro commented 4 years ago

Plus having all the import cores be their own directory at the top level is actually a really nice summary of the whole ecosystem...

enjoy-digital commented 4 years ago

@mithro: i think this is going in the good direction, i'll think about that more over the weekend.

ewenmcneill commented 4 years ago

To reduce notifications/keep things in one place, I'm trying to reply to both @enjoy-digital's submodule design and @mithro's litex-all design in one comment. It's kinda long :-)

It seems to me we currently have three designs:

Existing design, everything is (recursive) static submodules, recursive checkout of everything encouraged
A (recursive) dynamicly-added submodule design, with a front end tool, where "features I want to use" are enabled by request (@enjoy-digital)
A clever redesign mixing submodules, pip requirements, and git subtrees (@mithro).

First up, I think I like both of the new variations more than I like the existing static submodules/recursive checkout approach (which seems to have accumulated a lot of recursive dependencies now, and clearly there'll be more). FTR, it's the complexity/number of dependencies that bother me, especially trying to git pull update, more than the checked out data size. The data size is non-trivial, and I have 3+ fully copies of litex already: litex directly, in litex-buildenv, in fomu-workshop, but even with three copies it's still not a gigantic amount of data (by modern standards).

Secondly it feels to me that litex is used as a git submodule of other things at least as much as its being used directly, ie it's "a library"/"a base" more than an end target. So integration into the checkout/build flow of those projects building on litex seems to be a key requirement.

Some more specific comments:

In #395, I like the general idea, but not really the CLI syntax: the CLI syntax feels both too verbose ("submodule-init") and overly specific to the underlying technology. I'd rather see a more abstracted syntax, where the implementation could be changed later and the syntax would still make sense (and I'm inherently more in the Unix philosophy of shorter command lines). Perhaps something like ./litex_setup.py --feature=eth --feature=dram / ./litex_setup.py --cpu=vexriscv, etc (and it'd be nice to call the tool something shorter/easier to type, eg litex, and support short args too).
In #395, the install option targets should be higher level features ("I want to be able to do this") rather than individual parts, ie one level more abstracted. And/or have higher level targets along side the lower level targets, where the higher level targets install/do more than one thing. At least as far as the naming is concerned. This too gives more flexibility to change the implementation later. (Hence --cpu=... above, rather than litex_cpu_...)
In @mithro's litex-all design, I think the use of pip requirements, recursively, is a promising approach, but I worry about users installing "user wide" or "system wide" and ending up with things scattered all over the place, including potentially starting with multiple versions of python (and thus different install directories). My feeling is that design would probably only work well in a virtualenv / conda setup, which potentially impacts on what downstream users of litex would need to do (versus git submodule which keeps everything inside the development tree, without further steps), and might mean they have to help their users with learning virtualenvs. (litex-buildenv effectively uses a virtualenv so that's okay, but IIRC fomu-workshop doesn't at present.)
In passing, pip install --user doesn't work inside a virtualenv (tooling refuses it, it's not in the virtualenv's python path). For anyone else trying, I think you want pip install -r requirements/vexriscv.txt; with that it seemed to work for me.
Using a venv also results in the checked out source being in, eg, venv/src/... which might be a bit harder for users to locate if they want to do development work on the specific component and test it in the context of larger litex (eg, working on integration / API / ... features). Thinking out loud, possibly giving the venv a more "look here" name, eg, modules or features or something might encourage users to look? modules/src is an obvious place to look for things, venv/src not so much.
I'm also a wee bit concerned (for downstream users) of mixing submodule pinning at a specific "known working" revision and pip requirements (recursively) which may or may not pin to a specific revision. It seems like it might be more difficult to reliably pin "the litex dependency" at a "tested version combination" than it is right now. That could possibly be addressed by providing some way to create a "lock" file (ie, pip freeze style) which let the downstream project say "have pip install precisely these versions of all our dependencies/our litex dependencies". (But is still a workflow change for downstreams.) I'm not sure how, eg, pip freeze interacts with installing direct from git repos, but presumably that works.
Finally, the submodule approach supports going back and forth in versions, eg, when doing a git bisect to figure out which combination of things last worked. We might need to think about how, eg, git bisect could be reliably done when using a "recursive pip requirements" approach. Can pip reliably downgrade installed modules?

Ewen

ewenmcneill commented 4 years ago

FTR, looks like pip freeze(at least in a virtualenv) will pin precise versions from git repos as well:

(venv) ewen@parthenon:/src/fpga/test/mithro-litex-all-test$ pip freeze
colorama==0.4.3
-e git+https://github.com/mithro/litex-all.git@e0e1f125a5ebaf56f06a6537ccb5891b5d7da06f#egg=litedram&subdirectory=litedram
-e git+https://github.com/mithro/litex-all.git@e0e1f125a5ebaf56f06a6537ccb5891b5d7da06f#egg=liteeth&subdirectory=liteeth
-e git+https://github.com/mithro/litex-all.git@e0e1f125a5ebaf56f06a6537ccb5891b5d7da06f#egg=litesata&subdirectory=litesata
-e git+https://github.com/mithro/litex-all.git@e0e1f125a5ebaf56f06a6537ccb5891b5d7da06f#egg=litex&subdirectory=litex-core
-e git+https://github.com/mithro/litex-data-vexriscv.git@f266e40fc8335d565c8d561451c87e3fc1537f40#egg=litex_data_vexriscv
-e git+https://github.com/m-labs/migen.git@e2e6c726c9c34209cd326d0a80df63668285a378#egg=migen
pkg-resources==0.0.0
pyserial==3.4
PyYAML==5.3
(venv) ewen@parthenon:/src/fpga/test/mithro-litex-all-test$

which might be part of solving both the downstream "this is the tested revision to build on" and maybe also git bisect issues (if the pip freeze "tested with" was maintained as part of the repo in some requirements format).

Although we might need something that filtered out "end user environment" dependencies, as freezing on specific PyYAML versions that some developer had installed, for instance, might just be unhelpful/overly inflexible.

Ewen

mithro commented 4 years ago

@ewenmcneill -- Couple of comments.

pip install --user doesn't work in a virtualenv as it is asking you to install it globally for your current user -- which is incompatible with a virtualenv.

Almost all the important dependencies are install from the current repository and don't end up in env/src.

All the important dependencies are installed using --editable mode, which means that if you change the source (say by changing what you have currently checked out) it should reference the correct version. This should work significantly more reliably than the git submodule update X; cd X; python setup.py develop method.

The local dependencies are pulled into the current repository using git subtree which has the same version guarantees as git submodules (specific git commits, etc) without all the hassle.

External users can continue to use the more familiar git submodule flow if they wish.

mithro commented 4 years ago

@ewenmcneill What do you have in your /src/fpga/test/mithro-litex-all-test repository?

ewenmcneill commented 4 years ago

@ewenmcneill What do you have in your /src/fpga/test/mithro-litex-all-test repository?

It's a checkout of https://github.com/mithro/litex-all; the directory is just named like that so I don't forget that what it is is likely to go away, and wonder in 6 months time what it was and why the upstream is gone :-)

FTR, this is what ends up in .git/config after doing a checkout, virtualenv, and the vexriscv requirements example. (With pip installed modules as listed in https://github.com/enjoy-digital/litex/issues/394#issuecomment-589866058)

Ewen

ewen@parthenon:/src/fpga/test/mithro-litex-all-test$ cat .git/config 
[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
[remote "origin"]
    url = https://github.com/mithro/litex-all.git
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
    remote = origin
    merge = refs/heads/master
[submodule "litex/build/sim/core/modules/ethernet/tapcfg"]
    active = true
    url = https://github.com/enjoy-digital/tapcfg
[submodule "litex/soc/software/compiler_rt"]
    active = true
    url = https://git.llvm.org/git/compiler-rt
ewen@parthenon:/src/fpga/test/mithro-litex-all-test$

ewenmcneill commented 4 years ago

pip install --user doesn't work in a virtualenv as it is asking you to install it globally for your current user -- which is incompatible with a virtualenv.

Indeed. However there's a cut'n'paste issue in https://github.com/mithro/litex-all/blame/master/README.md#L56 (it is says --user it should say -r in the virtualenv example). Hence the comment in passing if someone else is trying to follow those instructions.

Almost all the important dependencies are install from the current repository and don't end up in env/src.

Cool, that should help. If it's just "third party CPU verilog" like dependencies that are installed hidden away, then it shouldn't matter too much except for a few third party upstream authors testing in combination (who presumably know their way around python virtualenvs) I noticed liteeth/litedram/... at the top level, but wasn't sure if that was the plan for all "internal" dependencies.

All the important dependencies are installed using --editable mode, which means that if you change the source (say by changing what you have currently checked out) it should reference the correct version.

Yes, this --editable is definitely a significant win for developers. (And subtrees are also a bit more helpful for this than submodules, due to being designed later.)

Thanks for all the comments @mithro!

Ewen

mithro commented 4 years ago

I fixed up the README a little.

Can you tell me what is in your env/ and env/src directory? This is what I have;

$ ls env/
bin  include  lib  share  src
$ ls env/src
litex-data-vexriscv  migen  pip-delete-this-directory.txt

ewenmcneill commented 4 years ago

@mithro Looks the same to me:

ewen@parthenon:/src/fpga/test/mithro-litex-all-test$ ls venv/
bin  include  lib  share  src
ewen@parthenon:/src/fpga/test/mithro-litex-all-test$ ls venv/src/
litex-data-vexriscv  migen  pip-delete-this-directory.txt
ewen@parthenon:/src/fpga/test/mithro-litex-all-test$

But since I'd only installed one thing, I hadn't looked far enough to figure out where other dependencies/modules would be installed. If "the ones you want to edit" are all easily visible, and "the ones you probably don't want to touch" are hidden away, that's a good balance.

Ewen

enjoy-digital commented 4 years ago

@mithro: I think your solution is very good for deployment but i've been thinking about the development, in i'm not sure subtrees will be convenient for me, since i often need to switch branch on different repository or get back to a specific commit to find regressions, and i'm not sure that easy with subtrees.

But that's just the way i use the project and we could have something similar to what you are suggesting:

In LiteX (https://github.com/enjoy-digital/litex) only keep tapcfg/compiler_rt submodules.
Create litex-data-xxyy repositories in LiteX-Hub for the data of the CPU.
Use theses data-package in LiteX to replace the submodules. The installation of theses data-package in LiteX could be done with a simple setup script or with the requirements, but nothing will be installed by default. So for use-cases like @xobs with Fomu, where Vexriscv is used but no using the verilog from litex-data-vexriscv, minimal install will be enough (since the CPU wrappers will still be provided by the minimal install).

In LiteX-Hub, instead of a litex-all repository, i would maybe call it litex-ecosystem. This would be an easy way to setup LiteX Ecosystem where LiteX and all the cores are provided as subtrees and data-packages could be easily installed with the requirement mechanism. We could keep things similar to your prototype, but i would just keep litex instead of `litex-core. We could also include cores like ValentyUSB directly also as subtrees. As you suggest, with a bot for the updates and Travis-CI it would minimize extra-work and provides a way to know which versions have been validated together.

If we go this way, it means changes to LiteX repositories will be minimal (mostly switching to submodules for the CPU to python data packages) and we could start creating the litex-data-xxyy packages for the CPUs, switch to it in LiteX and create the litex-ecosystem repository in LiteX-Hub? What do you think?

mithro commented 4 years ago

@enjoy-digital

The more I think about it, the more I think tagcfg and compiler_rt should be litex-data modules too (see the thread where people were struggling with the submodules).

As a step forward I think we do the following.

[ ] Remove all git submodules from litex repo and instead use litex-data Python modules under LiteX-Hub.
[ ] Make litex use these Python modules. They can be installed via a script, requirements.txt or even manually.
- Only the compiler_rt module will be required to be installed for litex to work.

After that is done, we can then continue to explore a bunch of different options for a combined litex-ecosystem repository but everyone agrees we have not decided on the preferred solution yet.

I'll put together options which use git submodules, git subtrees and just frozen pip requirements.txt files. In all cases these repositories will be automatically kept in sync with the module during development.

I would also like to explore reducing the litex-setup.py script into mostly just generation of a requirements.txt files and then calls to pip.

mithro commented 4 years ago

@enjoy-digital Do you want me to send a pull request to litex which makes this happens?

mithro commented 4 years ago

I wrote a tool to create the litex-data-XXX repositories from the various submodules. It's still a work in progress but you can find it at https://github.com/litex-hub/litex-data-auto and some example output at https://github.com/litex-hub/litex-data-cpu-lm32

In the process of creating the litex-data-cpu-lm32 repository, it creates the setup.py, README.md and other files -- plus an __init__.py file with some tracking information in it, see https://github.com/litex-hub/litex-data-cpu-lm32/blob/759a2110097fd955060c90d09b226c7b4abc54e9/litex/data/cpu/lm32/__init__.py#L1-L8 It then uses git subtree to merge the data into the correct directory.

The configuration for the auto-generated data modules can be found at https://github.com/litex-hub/litex-data-auto/blob/master/modules.ini

Then we just need a bot running to auto update the repositories (with the simplest version looking like this);

git clone git+ssh://github.com/litex-hub/litex-data-auto.git
cd litex-data-auto
while true; do
    git pull
    ./update.py
done

enjoy-digital commented 4 years ago

Thanks @mithro for the work on litex-data-xxyy, let's switch to data packages for everything yes instead of submodules if it's easier. For litex_setup.py in addition of installing the data-package, i'd like to keep the ability to clone and install LiteX and the cores and be able to do an update of all repositories (that's what i'm using for the dev to easily keep things in sync). You can create a PR for the data-packages in LiteX yes.

mithro commented 4 years ago

Will send you an pull request in the next couple of hours.

gsomlo commented 4 years ago

I did some googling on the difference between git submodules and subtrees, and find what I learned a bit concerning w.r.t. consistency of the code base moving forward.

Subtrees are copies of (rather than pointers to) another repository. As such, the downstream project "owns" all changes to the subtree, and may (or may not, it's their choice) push any such changes back to the "upstream" project that provided the original subtree source.

To the extent that anything tracking an "upstream" project as a subtree is meant as a "packaging for ease of use for downstream users" thing, and not as a critical-path for development thing, I'm ok with it. Not sure though that's how the cpu modules are intended, though.

For example, if we need to update Rocket, the steps would now be:

update litex-hub/litex-rocket-verilog
pull from litex-hub/litex-data-cpu-rocket
whatever needs to be done to get the main litex project to see the updated ltex-data-cpu-* module

Maybe step 3 above is "nothing", and that's one of the intended positive outcomes of this change :) But I find I typically need to go through this in order to change something about how LiteX interacts with Rocket, so there's going to likely be a matching LiteX commit anyway, so the only outcome IMHO is that thightly coupled changes are now tracked more loosely than before...

Then there's the question of "what if someone modifies litex-hub/litex-data-cpu-rocket and never bothers to do a git subtree push (or would that now have to be a PR against litex-hub/litex-rocket-verilog)?

I'm hopefully missing something obvious, but I'm worried we're over-complicating things in upstream LiteX to make life easier for its downstream consumers. That may be a valid tradeoff, so please ignore me if that's been thoroughly considered already :)

Cheers, --G

EDIT: this is also re. https://github.com/enjoy-digital/litex/pull/399

enjoy-digital commented 4 years ago

In fact for developers we are only replacing submodules with python packages. The current submodules are tapcfg, compiler_rt and the ones for the verilog/vhdl code of the CPUs.

The subtrees could be used in a litex-ecosystem repository, but this repository will only be used to ease packaging and handled/updated by a bot. Developers will still use LiteX + cores repositories.

For the Rocket case, that's true that LiteX will be linked to a commit in litex-hub/litex-data-cpu-rocket and not directly to litex-hub/litex-rocket-verilog. But we only have two repositories we maintain for the verilog/vhdl code (litex-hub/litex-rocket-verilog and litex-hub/litex-rocket-vexriscv), so this should be fine. If needed, we could even get rid of these repositories and just maintain litex-hub/litex-data-cpu-rocket and litex-hub/litex-data-cpu-vexriscv, the others litex-hub/litex-data-cpu-xxyy could still be automatically created from external git repositories).

gsomlo commented 4 years ago

OK, I can live with s/submodules/packages for most things, in fact that's already the way things are with LiteX vs. LiteSDCard, LiteDRAM, etc.

Re. CPUs: we introduced litex-rocket-verilog as a layer of indirection between LiteX and the upstream https://github.com/chipsalliance/rocket-chip project, which makes perfect sense. But now we're considering the insertion of litex-data-cpu-rocket as an additional layer of indirection, this time between LiteX and litex-rocket-verilog. With bundling (subtrees) as opposed to a pointer (submodules) on top of that...

Wouldn't it make more sense to just have litex-data-cpu-rocket and get rid of litex-rocket-verilog altogether after migration? Fewer layers of indirection, and no multiple places containing clones of the same stuff (the "compiled" verilog for rocket would just be part of litex-data-cpu-rocket directly)?

I'd recommend the same for litex-data-cpu-vexriscv, for consistency (but can't speak authoritatively for that part of the project :)

What do you all think?

enjoy-digital commented 4 years ago

@gsomlo: regarding litex-rocket-verilog vs litex-data-cpu-rocket i'm thinking the same and that's what i was trying to say in my last comment (but it was maybe not clear :)).

gsomlo commented 4 years ago

On Wed, Feb 26, 2020 at 06:27:15AM -0800, enjoy-digital wrote:

@gsomlo: regarding litex-rocket-verilog vs litex-data-cpu-rocket i'm thinking the same and that's what i was trying to say in my last comment (but it was maybe not clear :)).

Heh, we were in violent agreement, then :)

mithro commented 4 years ago

@gsomlo & @enjoy-digital - I played a little dance.

I removed litex-data-cpu-rocket and renamed the litex-verilog-rocket into litex-data-cpu-rocket (so hopefully github does the redirects) and then pushed the new litex-data-cpu-rocket Python module contents into the repo.

I also did the same with litex-data-cpu-vexriscv.

I finally updated litex-data-auto repository to understand that litex-data-cpu-rocket and litex-data-cpu-vexriscv are "generated sources" repositories rather than "subtreed in" repositories.

mithro commented 4 years ago

FYI - The other litex-data-XXXX repos will automatically update daily or anytime the litex-data-auto repository is pushed.

enjoy-digital / litex

RFC: Split LiteX CPU cores into their own Python modules #394