Tracking for pmm development

xacrimon commented 5 years ago

Initial list of high priority package managers to support:

apt/dpkg
dnf/yum
portage
pacman
eopkg

We will also need to decide on which package managers CLI interface we want to replicate. This could be along the style of pacman or maybe apt-get. We should probably either have a community vote or @paradigm should decide this.

If you want a package manager added to the list, reply and it will be added.

pmm will be written in C99 to comply with Bedrock standards.

Mouvedia commented 5 years ago

We will also need to decide on which package managers CLI interface we want to replicate.

install
remove (AKA rm and uninstall)
upgrade
search (AKA find)
info (AKA show)
update: update local database/index of packages

paradigm commented 5 years ago

Here's some open research items we'll need to work on pmm:

A list of package managers we should target for the initial release.
- These should include the main package managers for every brl fetch --list distro on x86_64, which at the time of writing is: alpine arch centos debian devuan fedora gentoo ubuntu void void-musl. These are what most users will expect.
- These should include at least two package managers that are completely independent of the main one, such as language specific ones like pip, to make sure our architecture handles that situation.
- These should include at least two package managers which partially overlap with the main one, such as an AUR helper, to make sure our architecture handles that situation.
- These should include at least two package managers which complete/almost-complete overlaps with the main one, such as apt/aptitude or yum/dnf, to make sure our architecture handles that situation.
- If we cannot brl fetch a distro that works with a given package manager, that will make testing difficult, and thus it's best to leave the package manager off for now.
We should note which package managers are native to the given distro (e.g. apt) and which brl-fetch distros they're native to (e.g. debian, devuan, and ubuntu), which work on the native package list but aren't native (e.g. yay) and their brl-fetch distros (e.g. arch and arch-arm), and which are not native and use their own repos (e.g. pip).
A list of all package manager operations that we want to support with the initial release.
- If something is common to at least half the package managers we're initially targeting, we should try to support it in the initial release.
  - There's value in getting a wide swath here to ensure our initial architecture is adequately flexible. Better to release later with something good than release early with something inadequate that requires a complete rewrite and takes more total time.
- Each should have a possibly wordy explanation that describes it to ensure everyone is on the same page about what is being discussed.
- Each should have a short phrase that describes it. The fewest words possible while still being as clear as possible.
  - We'll use these internally for things like function names.
  - We might use these as a sort of generic UI for any automation that may call it, such as shell completion or a GUI wrapper.
A list of all package manager operations in the order that they should be done if it's possible to indicate multiple should be done in a single command.
- For example, if a package manager supports updating the list of available packages and upgrading all installed packages in the same command, updating the list of remote packages should happen first.
- I'm not actually sure there are other situations, that example might be the entirety of it. Still, better to document it than forget the need.
A table mapping every command/workflow we want to initially support with every package manager we want to realistically support.
- The example package manager commands should include an indicator for arguments to help make it clear which have expected arguments and which don't. For example, apt install <packages> and apt update.
For every item in the aforementioned table, include an example of what pmm would look like if it tried to mimic the given package manager's command format for the given operation. For example:
- next to apt install <packages> would be pmm install <packages>
- next to pacman -S <packages> would be pmm -S <packages>
- next to emerge <packages> would be pmm <packages>
- next to xbps-install <packages> would be pmm-install <packages>
For every item in the aforementioned table, do something to note that it requires something the corresponding distro may not have installed by default. For example, apt uses apt-file for some features, and apt-file is not installed in Debian by default.
For every item in the aforementioned table, note any relevant flags that aren't generic across the entire package manager.
For every operation we want to initially target, note if any could benefit from a pmm-specific cross-package-manager feature, and what that feature would be. For example, a pmm command to install a package might take a flag --newest flag to indicate the newest version from all of the available ones is desired. Feel free to get creative here.
A list of operation-independent flags, such as --noconfirm or --quiet, and their equivalents for each initially targeted package manager.
- In addition to surfacing these in pmm's CLI, we might also want to add pmm configuration to have these set by default.
A list of pmm-specific features/flags we may want. For example, the ability to restrict the backing package manager list to only native ones, or to only from certain distros, or to use a consistent generic UI format for easy automated parsing.
A list of both local and remote file manager database locations for things like available packages and installed packages, as well as descriptions of what's there (e.g. xml files, custom flat file format, etc).
- Where does the package manager look to find its list of manually installed packages?
- Where does the package manager look to find its list of automatically installed packages?
- Where does the package manager look locally to find its list of available packages?
- Where does the package manager look remotely to update its local list of available packages?
- Where does the package manager look to find which installed package owns a given file?
- Where does the package manager look locally to find which package offers a given file?
- Where does the package manager look remotely to update its mapping of package managers and files?
- Where does the package manager look to find which mirrors it should use?
- We'll probably use this information to build our own databases in our own format, which should improve performance for things like cross-package-manager searches over individually querying each package manager.
- We'll also compare timestamps to see when our own databases are out of date and need to be refreshed against the backing package managers package managers.
- We'll also use this information to de-dup package managers that overlap such as pacman and an AUR helper using the same list of installed packages.
- We might use this information to create "virtual" strata so users can search not-yet-installed package managers for information. If they find a package they want, it could help brl fetch it and install it. I'm not yet certain whether this feature will make the cut for the initial pmm release, if at all.
A list of descriptions and examples of output for operations that pmm will use its own output format for.
- Installing a package might just forward the underlying package manager's output and won't need an entry here.
- Listing installed packages will likely use pmm's UI to also indicate corresponding strata and package managers. It may indicate, for example, that arm_now comes from arch's pip3.
A list of the kind of information pmm will need to store in its own database.
- list of available packages, their versions, their descriptions (to be searched against), and their list of files. What else?

There could also be value in gathering:

A list of all package managers there could be realistic interest in that we can use to feed work later.
A list of all package manager operations someone could be realistically interested in that we could use to feed work later.
A list of places to reference for this kind of information, such as the pacman rosetta

paradigm commented 5 years ago

Initial list of high priority package managers to support:

I'd like to expand the scope well past that.

We will also need to decide on which package managers CLI interface we want to replicate. This could be along the style of pacman or maybe apt-get. We should probably either have a community vote or @paradigm should decide this.

We should support mimicing the CLI interface of every package manager we plan to support in any other fashion. Broadly, either we support the package manager, or we don't.

pmm will be written in C99 to comply with Bedrock standards.

Current plan is to do it in busybox shell. If we run into performance problems we can revisit the possibility of using C99.

Mouvedia commented 5 years ago

If we cannot brl fetch a distro that works with a given package manager, that will make testing difficult, and thus it's best to leave the package manager off for now.

Which means doing a fetching script for that distro becomes a prerequisite […].

paradigm commented 5 years ago

If we cannot brl fetch a distro that works with a given package manager, that will make testing difficult, and thus it's best to leave the package manager off for now.

Which means doing a fetching script for that distro becomes a prerequisite […].

For development and testing, yes.

For use, no. Users can get the package manager by acquiring the associated distro some other way, such as hijacking it or installing it in a VM and copying the files over.

xacrimon commented 5 years ago

Fantastic! I'll start responding and collecting and putting together information tomorrow. This is going to be a huge project. Probably >2.5kloc. I heavily advise writing this is C for the reason that shell will have problems with database stuff and will heavily bottleneck performance. C will also win in project organization and resource consumption.

Im going to be putting together a design document from input from everyone here and gathered info. Discussion regarding design will take place. Once the design document is complete and agreed upon the project can start development. Does this sound good @paradigm?

xacrimon commented 5 years ago

And, yes. Supporting all package managers you detailed is on the horizon

xacrimon commented 5 years ago

I don't know of you have worked with package managers before but writing this in busybox shell is going to be unnecessarily hard in a couple of ways and the performance is going to be agonizingly slow. Package managers need to do a LOT of work. This is no exception.

xacrimon commented 5 years ago

Better do it once and properly than invest lots of time in building something that won't scale. I've been there before.

paradigm commented 5 years ago

Fantastic! I'll start responding and collecting and putting together information tomorrow.

Excellent!

And, yes. Supporting all package managers you detailed is on the horizon

Nice :)

This is going to be a huge project. Probably >2.5kloc. I heavily advise writing this is C for the reason that shell will have problems with database stuff and will heavily bottleneck performance. C will also win in project organization and resource consumption. [...] I don't know of you have worked with package managers before but writing this in busybox shell is going to be unnecessarily hard in a couple of ways and the performance is going to be agonizingly slow. Package managers need to do a LOT of work. This is no exception. [...] Better do it once and properly than invest lots of time in building something that won't scale. I've been there before. [...] Im going to be putting together a design document from input from everyone here and gathered info. Discussion regarding design will take place. Once the design document is complete and agreed upon the project can start development. Does this sound good @paradigm?

We seem to be on completely different pages here. Either I'm missing the difficulties you're seeing, or you're missing the solutions I'm seeing.

How about this: in the immediate future, you work on the research items I laid out while I work on other pressing Bedrock priorities. Whatever architectural and language approach we take, those things need to get done anyways. When the research is done, I'll take a quick run at writing a representative chunk of it over my next free weekend. If it ends up being too hard, or too slow, we'll toss it and revisit design ideas. If it works out fine, we continue with it. That seem agreeable?

xacrimon commented 5 years ago

Seems like we can agree! 👍

Mouvedia commented 5 years ago

What about the output of the commands? Do we reformat and normalize the output? Do we separate the results of let's say search per pm? Or do we go agnostic? If we go agnostic, what about missing fields of info? Should it be empty or skipped? Would it be better to rely on a subset that is guaranteed to be provided?

e.g. not all nix packages have descriptions Should we try to retrieve the missing information from other pms? If so which one will be the preferred one?

Id advise against weights. Parallel requests would be much better: race them.

paradigm commented 5 years ago

What about the output of the commands? Do we reformat and normalize the output?

Operations where we're just forwarding things along to the backing package managers, such a installing a package, will use the backing package manager's output. I don't think cost/benefit favors trying to parse it.

Operations against pmm's database, such as searching, will probably use our own format, at least at first. Maybe later we'll something to mimic other package manager's output formats.

Do we separate the results of let's say search per pm? Or do we go agnostic? If we go agnostic, what about missing fields of info? Should it be empty or skipped? Would it be better to rely on a subset that is guaranteed to be provided? e.g. not all nix packages have descriptions

We'll take it by a case-by-case basis. We might restrict our output to things everything supports, or if only a handful are missing a given field we can indicating it's invalid in this context with something like - or N/A.

Should we try to retrieve the missing information from other pms? If so which one will be the preferred one?

If you're proposing using descriptions from one package manager to describe packages from another, I don't think that worthwhile. I prefer to treat each package manager as responsible for its own subset of the system.

Id advise against weights. Parallel requests would be much better: race them.

I don't understand what you're saying here.

Mouvedia commented 5 years ago

I don't understand what you're saying here.

Ill rephrase then. Weighting is a way to sort/order/rank the pms using a pre-populated config file. The implementation of such a system is often cryptic and results in subjective defaults. Depending on the elapsed time between the request and the response is fairer and simpler.

we can indicating it's invalid in this context with something like - or N/A.

So you are picking "empty".

paradigm commented 5 years ago

Ill rephrase then. Weighting is a way to sort/order/rank the pms using a config file. The implementation of such a system is often cryptic and results in subjective defaults. Depending on the elapsed time between the request and the response is fairer and simpler.

I still don't follow. I don't know what you mean by request and response here, and I don't see any possible values that would make sense to contrast against weighing/sorting/ordering/ranking package managers.

Mouvedia commented 5 years ago

nix: 100
pacman: 80
eopkg: 79 # what happens when the user sets it to 80?

Instead I recommend to launch all requests concurrently and show their results once they arrive.

paradigm commented 5 years ago

Ah, I think I see what you're saying. For operations that communicate with multiple package managers, you see two options:

Run them sequentially, using weighting to order them.
Run them in parallel and order the output based on which finishes first.

And you're recommending the latter one.

First release of pmm will probably run such operations sequentially with an undefined order. It'll be simpler to implement and will make it easier to handle scenarios such as errors by the backing package manager. We can look into optimizing those by parallelizing them down the road.

Mouvedia commented 5 years ago

with an undefined order.

Exactly what I wanted to avoid… Some pms are way slower than others.

paradigm commented 5 years ago

With most operations we're either:

Using our own database and not making queries to package managers that could be parallelized, such as searching for packages that match a given string.
Following the order the user specified, such as installing packages.

The exceptions are things like instructions to remove orphaned packages. I don't follow why the ordering of the output would be important. While it may certain be faster if we parallelize, I'd prefer to take a performance hit early on for the sake of simplicity and getting something out the door. We can always go back to parallelize those later.

Mouvedia commented 5 years ago

We can always go back to parallelize those later.

That was just a recommendation. Everything will rely on the sequentiality: I reckon you will have a hard time refactoring.

Ah, I think I see what you're saying.

Third time's the charm.

xacrimon commented 5 years ago

Okay, there hasnt been much progress lately from me (barely any). This is because im currently on vacation with GF. Once I return the gears should start turning again.

xacrimon commented 5 years ago

Okay, due to some recent acute financial issues I currently do not have the free time to work on this project. Im fully busy keeping myself afloat. I may return. I don't know. I'm incredibly sorry for the inconvenience.

paradigm commented 5 years ago

No worries. That's how life goes sometimes. No pressure here; it's all volunteer work when people have the energy, interest, and time. The most important part of a software development environment is the keyboard actuator - take care of yourself first. If your situation improves and interest remains such that you contribute later, that's great, but if not, that's honestly fine as well.

bedrocklinux / bedrocklinux-userland

Tracking for pmm development #103