Risto-Stevcev / cl-micropm

A very minimalist, decentralized "package manager" for Common Lisp (<200 LOC)
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

Dead code? #1

Open jaredkrinke opened 1 year ago

jaredkrinke commented 1 year ago

First off, I want to say that I like this concept, especially pulling dependencies straight from upstream.

Despite cl-micropm already being tiny, I think there is even room to shorten it further! Specifically, I think there is some dead code here, if I'm not mistaken (which I might very well be):

https://github.com/Risto-Stevcev/cl-micropm/blob/3835e76dbdfc9fb1c39a1416bbbc64412e432f77/cl-micropm.lisp#L141

Aside: I don't yet understand why a container is being used for cl-micropm -- can the system index not just be downloaded directly (using HTTPS)? I suppose the current approach validates the signature, but the public key seems to just be served over HTTPS anyway.

Risto-Stevcev commented 1 year ago

Thanks! yeah I agree, it can be even smaller and simpler.

I think there is some dead code here, if I'm not mistaken (which I might very well be)

Yeah I think it makes sense to get rid of those quicklisp sources, kmr and ediware stuff, since those are very old projects, and instead host mirrors to those on Github/Gitlab/Sourcehut/Codeberg. What do you think?

Aside: I don't yet understand why a container is being used for cl-micropm -- can the system index not just be downloaded directly (using HTTPS)?

As far as I can tell, you can't. I asked a while back in an issue in one of the quicklisp repos, but nobody responded so far. I did some digging in the code and found where it's generated though. From what I can tell, quicklisp only generates the index programmatically when quicklisp gets bootstrapped: https://github.com/quicklisp/quicklisp-controller/blob/master/indexes.lisp#L162 . And that's what the Dockerfile is for -- to bootstrap quicklisp so that the systems.txt file is generated.

I think there are a few things that I want to do to substantially improve this little library. Instead of having these scripts and the Dockefile as part of the project, have it instead pre-generate the quicklisp dependencies (systems.txt) and sources (<proejct>/source.txt) and append it as an alist to this single file project.

The quicklisp sources come from the quicklisp-projects repo, but I think maybe it makes way more sense to have this as an alist that people can modify, so that, for example, you can create your own mirror of a quicklisp project and just update the alist to point to your mirror. That would also solve the issue of legacy sources like kmr and ediware, so that this project doesn't need to support them. It's not good to rely on those legacy sources anyway since http is insecure.

And you also might just want to create your own mirrors of the git and https dependencies that you need as well, in order to add an extra layer of security. That would avoid the kinds of situations that have happened in the past few years, where, for example, someone updates an npm repository that a lot of companies rely on with malware. You could avoid that problem entirely by creating your own mirrors, auditing the code, and then it's out of sight, out of mind. One of the goals of this project is to have that sort of control over things, to avoid using a curated centralized repository and instead have decentralized sources like git that you can control.

The quicklisp dependencies come from the bootstrapped quicklisp, which is done via the Dockerfile. By pre-generating that as well, and having it also be an alist, it also allows people to extend the variable to include things that were never published in quicklisp, either because nobody tried or because they were rejected. Which I think are actually a substantial number of projects.

And pre-generating these doesn't have to happen often. For example, the quicklisp site says when the index was last updated, which as of now, is June 19, 2023. So I would've only had to pre-generate it on June 19th, and then it would still be all up-to-date up to today.

Pre-generating the source.txt files and the systems.txt file into alists can be all programmatic. Just have a Makefile that I run whenever I want to update this repo. The Makefile will run the Dockerfile to generate systems.txt, which I then convert to an alist. The Makefile also turns those quicklisp-projects/<project>/source.txt sources into a single alist. And then concatenate the two alists to the cl-micropm.lisp file. Then just commit the updated sources and publish. And then it's all in one single file.

And by doing it that way, the cl-micropm.lisp file becomes really easy to understand, because then it's just two alists, and then only two side-effects need to happen: setup to clone the dependencies, and then setup-asdf-registry to configure ASDF.

What do you think?

jaredkrinke commented 1 year ago

Very detailed response, thanks!

I think there is some dead code here, if I'm not mistaken (which I might very well be)

Yeah I think it makes sense to get rid of those quicklisp sources, kmr and ediware stuff, since those are very old projects, and instead host mirrors to those on Github/Gitlab/Sourcehut/Codeberg. What do you think?

I originally opened this issue simply because I didn't see those functions being used anywhere. I assumed it was because they were left over from some previous iteration, but it sounds more like they represent useful functionality that just isn't used by default. I'm definitely not qualified to say if including them is useful or not--I've only been using Common Lisp as a hobby for a few months.

Aside: I don't yet understand why a container is being used for cl-micropm -- can the system index not just be downloaded directly (using HTTPS)?

As far as I can tell, you can't. I asked a while back in an issue in one of the quicklisp repos, but nobody responded so far. I did some digging in the code and found where it's generated though. From what I can tell, quicklisp only generates the index programmatically when quicklisp gets bootstrapped: https://github.com/quicklisp/quicklisp-controller/blob/master/indexes.lisp#L162 . And that's what the Dockerfile is for -- to bootstrap quicklisp so that the systems.txt file is generated.

While trying to find a more secure CL package manager, I was reading about Eric Timmons's CL Project Index, and he included some links to a file that looks a lot like what I think cl-micropm is consuming: http://beta.quicklisp.org/dist/quicklisp/2021-08-07/systems.txt

That latest version of that file is referenced from: https://beta.quicklisp.org/dist/quicklisp.txt

Disclaimer: I haven't inspected the file closely to determine if it's the same as what you're after (it sure does look similar), and I don't know if consuming them is "supported"--I simply assumed that this was what the Quicklisp installer or client did. It's very likely I'm misunderstanding how cl-micropm and/or Quicklisp work :)

I think there are a few things that I want to do to substantially improve this little library. Instead of having these scripts and the Dockefile as part of the project, have it instead pre-generate the quicklisp dependencies (systems.txt) and sources (<proejct>/source.txt) and append it as an alist to this single file project.

That would remove a direct dependency from the Quicklisp web site entirely, with the caveat that Quicklisp would be used to update the list. Basically a snapshot of Quicklisp's index, right? That's pretty much what I think I want, but of course I don't have enough experience to know if it's actually a Good Idea.

And you also might just want to create your own mirrors of the git and https dependencies that you need as well, in order to add an extra layer of security. That would avoid the kinds of situations that have happened in the past few years, where, for example, someone updates an npm repository that a lot of companies rely on with malware. You could avoid that problem entirely by creating your own mirrors, auditing the code, and then it's out of sight, out of mind. One of the goals of this project is to have that sort of control over things, to avoid using a curated centralized repository and instead have decentralized sources like git that you can control.

My priorities were basically:

  1. Avoid unencrypted protocols
  2. Pull directly from upstream (esp. since I read that, at least in the past, Quicklisp itself was pulling from some upstream sources unencrypted)

The quicklisp dependencies come from the bootstrapped quicklisp, which is done via the Dockerfile. By pre-generating that as well, and having it also be an alist, it also allows people to extend the variable to include things that were never published in quicklisp, either because nobody tried or because they were rejected. Which I think are actually a substantial number of projects.

Thanks! Yes, I was missing this context on modifying the variable after the fact.

And pre-generating these doesn't have to happen often. For example, the quicklisp site says when the index was last updated, which as of now, is June 19, 2023. So I would've only had to pre-generate it on June 19th, and then it would still be all up-to-date up to today.

Pre-generating the source.txt files and the systems.txt file into alists can be all programmatic. Just have a Makefile that I run whenever I want to update this repo. The Makefile will run the Dockerfile to generate systems.txt, which I then convert to an alist. The Makefile also turns those quicklisp-projects/<project>/source.txt sources into a single alist. And then concatenate the two alists to the cl-micropm.lisp file. Then just commit the updated sources and publish. And then it's all in one single file.

And by doing it that way, the cl-micropm.lisp file becomes really easy to understand, because then it's just two alists, and then only two side-effects need to happen: setup to clone the dependencies, and then setup-asdf-registry to configure ASDF.

What do you think?

I think I mostly expressed my opinions above, but let me know if I missed something specific. And, of course, don't put too much stock in my opinions, given that I'm still learning about the ecosystem.

Risto-Stevcev commented 1 year ago

While trying to find a more secure CL package manager, I was reading about Eric Timmons's CL Project Index, and he included some links to a file that looks a lot like what I think cl-micropm is consuming: http://beta.quicklisp.org/dist/quicklisp/2021-08-07/systems.txt That latest version of that file is referenced from: https://beta.quicklisp.org/dist/quicklisp.txt

Nice, thanks for finding that! yeah that's the file that it generates. I could just pull the sources from there instead of using the Dockerfile.

That would remove a direct dependency from the Quicklisp web site entirely, with the caveat that Quicklisp would be used to update the list. Basically a snapshot of Quicklisp's index, right? That's pretty much what I think I want, but of course I don't have enough experience to know if it's actually a Good Idea.

Yeah exactly. I think it's a good idea because the snapshot doesn't get updated that often, the release cycle is once every few months from what I've been noticing. And it would make the actual file very simple, readable, and completely self-contained -- all of the dependency and sources info would be in there and would be customizable in the case of non-quicklisp projects and/or alternative sources for dependencies.

My priorities were basically: Avoid unencrypted protocols Pull directly from upstream (esp. since I read that, at least in the past, Quicklisp itself was pulling from some upstream sources unencrypted)

Yeah that's the goal of the library, with upstream ideally being git, because it's decentralized and it's easy to create mirrors of dependencies. That's why I left the ediware/kmr stuff sort of as a stub, I didn't know what I should do with those quicklisp sources. But I think it's better to just support git and https only, and have people mirror these older repos if they need them.

jaredkrinke commented 1 year ago

One potential problem to look out for: Quicklisp systems where the "project name" (the directory in "quicklisp-projects") differs from the ASDF system name. Specifically, there's a system named "marshal" in the "cl-marshal" directory, but there's also a system named "fmarshal" in a directory named "marshal".

I don't know if this impacts your code. I'm using a heavily modified fork, but in my fork, this caused it to download the wrong project for the "checkl" system (it downloaded the source to the "fmarshal" system instead of "marshal").

Anyway, I just wanted to put this in your radar in case you hit this problem yourself someday.

Risto-Stevcev commented 1 year ago

I noticed some version of this issue where a source can have multiple systems, but what you described sounds significantly worse.

I could try to study the quicklisp-controller code and maybe use that or a modified version to get the mapping file, but it does a whole lot of stuff like send emails and generate dist tarballs, and I'm not sure if it's actually involved in creating any sort of mapping file, though I could be wrong. The problem with that is that it'll go back to being code that's hard to understand, since quicklisp is monolithic, and I wanted something small so that it's easy to understand and people can modify/extend it for their own use-case, like you've been doing, sort of like the suckless programs.

I'm thinking I could just have a program clone all of the repos into the ~/common-lisp folder, and then asdf:locate-system all of the projects in systems.txt to get their definitive sources rather than guessing by name. One issue with that is that it would be running all this arbitrary lisp code since asdf would be caching all these system definitions which can run any code in their asd files, so it would need to be sandboxed somehow. I could just use packer instead of docker/podman though, to mitigate the risk of kernel exploits, container breakouts, etc. And then that could spit out the single file with some sort of (... (<project-name> ... (:sources ... :dependencies)) mapping, like a combined quicklisp-projects/**/source.txt and systems.txt. A lot of people seem to be wanting that, because then you wouldn't even need to use this project per se, you could just consult that single file and git pull everything manually, etc.

jaredkrinke commented 1 year ago

I probably should have included my solution as well as a description of the problem. Fortunately, systems.txt includes the "project" (directory) name as the first column, so all I had to do was start using that column when going to look up the source location in the quicklisp-projects submodule. Obvious disclaimer: I didn't look at the Quicklisp source, so this might be flawed, but it's working for me this far.

Risto-Stevcev commented 1 year ago

Ah ok, I was wondering why the first two columns were seemingly being duplicated. Well that simplifies things a lot then, the first column looks like it's the mapping for quicklisp-projects/<project-name>, and the second column is that actual system name.

jaredkrinke commented 1 year ago

One small correction: the third column is the system name, per the first line comment:

# project system-file system-name [dependency1..dependencyN]