buckyos / buckyos

Buckyos is a Cloud OS (Network OS) for everyone. Its primary design goal is to allow consumers to have their own cluster/cloud (we call this cluster Zone).Consumers can install Service in their own Zone just like installing App. Based on buckyos, users can have AI Agents that can access all their data, devices, and services.
5 stars 3 forks source link

Considerations for Package Management System Design #4

Open glen0125 opened 3 months ago

glen0125 commented 3 months ago

When designing a package management system, it's important to consider the purpose of installations, how to handle version conflicts, dependency resolution, the process of building and installing packages, as well as how to integrate with CI/CD workflows.

I believe a key issue is understanding the goals of our package management system. Different systems like Cargo, NPM, and Pip have their own unique approaches, and clarifying our objectives can help shape our architectural design. For instance:

  1. Cargo has a cargo install command which installs packages into a user-level directory, but it is more commonly used for project directory installations.
  2. NPM allows differentiation in installation levels using the -g flag for global installations.
  3. This means both Cargo and NPM have considered project isolation, which is convenient for project builds and CI/CD.
  4. Pip usually installs packages at the user level, which can lead to version conflicts across different projects. Python attempts to address this with the use of virtual environments (venv).

The questions that we need to address are as follows:

  1. Which package management philosophy does our system align with more closely?
  2. Is our package management system intended to manage agents or apps, i.e., non-system component packages?
  3. Should the installation be at the user level (system level) or the project level?

@waterflier

waterflier commented 3 months ago

You can get my thoughts by reading this article : https://github.com/fiatrete/OpenDAN-Personal-AI-OS/blob/main/doc/package_manager.md

And I have implemented the load part of pkg system in openDAN 0.5.1, you can also refer to~

Simply put, I hope that our design is based on understanding the advantages and disadvantages of existing package management systems (like pip/cargo/npm):

  1. Isolate pkg load / pkg install. For components that use pkg, you only need to use the load part.
  2. pkg load relies on env, making it easy to implement isolation.
  3. Provide the most basic pkg dependency management
  4. Introduce cid mechanism into version dependencies
  5. Implement verifiable install based on the cid mechanism without any verification on the load side (convenient for development and testing)
  6. It is foreseeable that our current usage scenario will be closer to docker's image management.
glen0125 commented 3 months ago
  1. What problem does this solve? primarily concerning trustworthy downloads? How does this approach differ from and what are the advantages over other package management systems, like npm\cargo\pip?

  2. Regarding the env setup, is env simply a path? Can it be any local path, or must it be initialized similarly to a path set by an npm init command?

  3. Package searching is conducted within the env and all parent_env directories, using recursive search through all subdirectories. parent_env directories cannot be nested within each other. For instance, if there is a parent a with the path /a that contains a subpath b: /a/b, there cannot be another parent_env with the path b, and the current env cannot be located under /a. Is this correct?

  4. If the target package is not found, are new packages installed in the current env by default?

  5. parent_env can be dynamically added but not removed. The package search order gives priority to the current env over parent_env. Is this correct?

  6. What exactly is media info?

  7. Does each env have its own index_db, or is there a system-level file? Is there a system-level env that serves as the parent for all other envs? What is the priority for index_db usage, with the local env taking precedence over parent_env?

  8. Are package dependencies discovered exclusively through the load code? Since the process appears to start with load, rather than with a file like package.json, is this the role of pkg.cfg.toml? If dependencies are declared through load, does this mean all code must be parsed?

  9. I am having difficulty understanding the following: Inside pkg.cfg.toml, there are two external files included: an external pkg.lock (for local version locking) and .pkgs/index-db.toml (an independently distributed package index by the distributor). Could you provide further clarification on this?

waterflier commented 3 months ago
  1. The system design has deeply referenced Git and NDN networks. The distinction between client and server is not that important. Through cryptography, it achieves decentralized trustworthy verification. Any client can become an effective repo server through simple configuration.

I want to gain a deeper understanding of this issue, which may require starting from the drawbacks of npm/cargo/pip. If you could point out some flaws in existing package management systems, I think we could have a more in-depth discussion about this.

  1. An "env" is a separate environment. A separate environment contains a complete pkg-index-db along with search logic, used for isolation (where pip/npm are extremely poor in isolation).

  2. The processes of Load/Download/Install are entirely independent. If a use case requires automatic installation after a load failure, there should be upper-layer logic based on the package system to assemble this.

  3. The management of Parent env is a detail issue. Strictly speaking, we have only defined a parent-child relationship, and how this relationship is managed depends on the scenario. However, I believe that for most scenarios, there will not be a need for dynamic adjustments.

  4. "Media info" refers to "storage media" information. Pkg load only concerns finding the supported media, not the actual loading. This means that the real structure of the pkg is read by the upper-layer application control.

  5. Each has its own index-db. An index-db is more like a .git folder, used to locally save more trustworthy package information to speed up the installation of dependencies. The scale of the index-db in which env also depends on the scenario.

  6. The system design separates package loading and dependency checking. It provides infrastructure for the upper layers to check for the presence of dependencies and make decisions. Dependencies are more for package installation.

  7. Pkg.cfg.toml is the configuration file for pkg_env (perhaps calling it pkg_env.cfg.toml would be better). Pkg.lock is for locking the default version. If this file is absent, the default version of pkg_name is locked by index-db.toml (which can be updated).