Existing package managers

wking commented 10 years ago

I'm not sure how much of a from-scratch project manager folks can write in two days (I expect it would be a trade-off: ticking off features vs. writing maintainable code), so I'm happy to see you folks considering using an existing package manager (and not building your own from scratch) as a base. You mention Bower's lack of virtuals (and “transative dependencies”, but I'm not sure what those are). Another general-purpose package manager that might work is HashDist, for which you'd just be writing a lesson stack. The HashDist spec docs aren't very polished yet, but there are lots of good examples of using them in the HashStack (which you probably wouldn't use, except for references of the spec syntax).

Personally, I still don't see what's wrong with using Git branches (or repositories), and pull (or optionally, the submodule commands). See swcarpentry/bc#102, and my example workshop repo (the pull-based version, and the submodule-based version). You can investigate the pull-based version with:

$ git clone --branch assembled git://tremily.us/swc-workshop.git

and the submodule-based version with:

$ git clone --recursive git://tremily.us/swc-workshop.git

I'm happy to talk you through your user story with either approach (or both approaches).

twitwi commented 10 years ago

Thanks a lot for your feedback and for the links, I didn't know about this bc issue. You had very interesting discussions there. First thing: indeed, we'll definitely not implement a package manager from scratch.

(a slightly messy answer, sorry)

I think I understood your two approaches. I am tempted to put away the branch/merge solution as I can foresee big namespacing problems, especially with things that require to be built etc. I agree with some comments also: from my experience, submodules can be tricky for some (most?) users. Still, submodules seem to be a very good basis for the task we have and having a light wrapper on top of git might be a direction to consider.

Compared to raw git submodules, I still like the principle of having a file that describes the dependencies. With submodules, this information is hidden in the git files and not easy to visualize and modify (you mention 'submodule deinit', the equivalent in a dependency file is to remove the line with the dependency).

I agree (with you, I guess) that we should stay close the version control and that's why I really like the bower approach where git tags are used to make versions of lessons. I just discovered HashDist with your post but it seems to be to much installation-oriented (specifying bash commands to execute on install etc) and that we should stay closer to the repository content.

What I feel would be beneficial for our community (as it has been in most software communities), would be to introduce semantic versioning. This helps a lot in maintaining versions while moving forward.

twitwi commented 10 years ago

About bower, it seem to support "transitive dependencies": A depends on B that depends on C, when you get B, you also get C. I must have misread it somewhere.

The "virtual packages" are more to handle a "A depends on (B or C)" situation. It matters more for the part of the project about the "graph of lessons" but this is actually almost a separate project, less technical and more oriented toward structuring the pedagogical content.

wking commented 10 years ago

On Thu, Jul 17, 2014 at 01:36:59PM -0700, Rémi Emonet wrote:

I am tempted to put away the branch/merge solution as I can foresee big namespacing problems, especially with things that require to be built etc.

I avoid namespace collisions by developing content in a root directory (e.g. 1) and using a separate branch to add the appropriate target namespace (e.g. 2). Nothing interesting happens in the namespaced branch, it just merges content changes from the lesson's master and adds any moves required to keep new files in the appropriate namespace 3. You can add as many namespacing branches as you like. For example, if you want my Git material under lessons/funky/git, just:

$ git clone git://tremily.us/swc-version-control-git.git $ cd swc-version-control-git $ git checkout -b funky-namespace $ mkdir -p lessons/funky/git $ git mv *.md lessons/funky/git $ git commit -m 'lessons/funky/git: Add my funky namespace'

and then publish that branch and merge it from your aggregate repository. Of course, if you're using submodules you don't have to bother with any of that, just:

$ git submodule add git://tremily.us/swc-version-control-git.git lessons/funky/git $ git commit -m 'lessons/funky/git: Add Git a lesson submodule'

I agree with some comments also: from my experience, submodules can be tricky for some (most?) users.

Unless you're doing some complicated branching, I think submodule difficulties mostly stem from a lack of familiarity with Git's object structures. I think plugging gaps in the submodule docs, and pushing people to actually read those docs, is the right approach here.

Still, submodules seem to be a very good basis for the task we have and having a light wrapper on top of git might be a direction to consider.

If you can keep the useful power of submodules with a cleaner UI, I'd rather push to land that upstream in Git itself (where submodule dev time is scarce) than have a separate wrapper outside of Git.

Compared to raw git submodules, I still like the principle of having a file that describes the dependencies. With submodules, this information is hidden in the git files and not easy to visualize and modify (you mention 'submodule deinit', the equivalent in a dependency file is to remove the line with the dependency).

.gitmodules and .git/config are just files themselves. Their syntax is more generic, and not explicitly in terms of ‘dependencies’. I don't have a particular problem with a wrapper that maps back and forth between these formats and a ‘dependency’-phrased format, but getting bi-directional syncing between substantially different formats is complicated.

I agree (with you, I guess) that we should stay close the version control and that's why I really like the bower approach where git tags are used to make versions of lessons.

I agree that the less code we have to layer on top of Git, the better. However, another problem I have with Bower is the centralized registry. It looks like there are a few packages for hosting private registries, but it looks like you have to tweak your ~/.bowerrc to change registries 4, and can't easily pull one package from one registry and another from another registry (which is really easy to do with Git).

I just discovered HashDist with your post but it seems to be to much installation-oriented (specifying bash commands to execute on install etc) and that we should stay closer to the repository content.

It is installation-oriented, but the stage system is flexible enough to do whatever you want 5. I don't have enough experience with it to know how easy it would be to pull together compiled lesson material into a single, easily-found directory, but @ahmadia may be able to shed some light on that.

What I feel would be beneficial for our community (as it has been in most software communities), would be to introduce semantic versioning. This helps a lot in maintaining versions while moving forward.

I love versioning, but I'm not entirely clear how well lesson material maps onto APIs. Say your lesson depends on some-git-intro v0.2.0. When their maintainers fix a typo and patch-bump to v0.2.1, you can upgrade your collection without checking their change, so that's fine. However, I'm not sure you can draw the same hard line between incomptible changes requiring a major version bump, and extension changes requiring a minor version bump.

wking commented 10 years ago

On Thu, Jul 17, 2014 at 01:43:37PM -0700, Rémi Emonet wrote:

About bower, it seem to support "transitive dependencies": A depends on B that depends on C, when you get B, you also get C. I must have misread it somewhere.

Ahh, so “recursive dependencies”. It's hard to have a meaningful package manager without those, since you can't get very far with a one-layer stack ;).

twitwi commented 10 years ago

Thanks a lot for your inputs, I really appreciate that you stepped in the discussion. We could formulate the objective of the sprint as testing and evaluating different approaches in order to select the best option. By any chance, will you be available during the sprint?

On the recursive/transitive aspect, I think transitive is not too inappropriate (https://en.wikipedia.org/wiki/Transitive_dependency). This made me think about something else: when you use nested submodules, you end up with a hierarchy of directories. For instance, if the bootcamp depends on A that depends on B that depends on C, you'll get a ./A/B/C directory. It does not seem to pose any problem in your example (git://tremily.us/swc-workshop.git master) but I'm wondering if it is because the submodules don't actually capture dependencies in your case? I my view, "learning git (from the CLI)" depends on "learning the shell". One thing that I like with package managers incl. bower (even though I actually don't like the central repository at all) is that they flatten the transitive dependencies, putting all checked out dependencies side by side.

On the namespaced branch, I really think it involves too many operations (git mv'ing individual files, ...) and is too error prone.

On the submodules, I'm not sure there is much to improve from a generic point of view, that's why the thin wrapper would be my preferred solution for now.

I know about .gitmodules and .git/config (are they somewhat not DRY by the way?), but I know a few people (and myself) that have been struggling when trying to start contributing to one of the submodule (replacing the existing one by our own fork). I guess "submodule deinit" might be a big help here if it removes the git caches etc... but I see a lot of machines with git 1.7 (debian...). This might be a new user story to add to the first I wrote.

I wouldn't call myself a git expert but I know a lot of people that are way less experienced that I am (and in a way, I can understand they don't want to spend hours or days filling this gap) and I would not be confident in asking them to use submodules from day 1.

On the semantic versioning, I think that we can make it very clear if we adopt the concept of "concept" (see https://github.com/twitwi/lesson-manager/blob/master/03-split-lessons-with-dependencies.md), where only major version change can remove a provided concept.

thanks again

rgaiacs commented 10 years ago

This made me think about something else: when you use nested submodules, you end up with a hierarchy of directories. For instance, if the bootcamp depends on A that depends on B that depends on C, you'll get a ./A/B/C directory. It does not seem to pose any problem in your example (git://tremily.us/swc-workshop.git master) but I'm wondering if it is because the submodules don't actually capture dependencies in your case?

And if your bootcamp depends on A (that depends on B that depends on C) and D (that depends on C)? You'll get

.
+- A
|  +- B
|     +- C
+- D
   +- C

That's one of the reasons that I want to use something like git repo [1].

[1] https://code.google.com/p/git-repo/

wking commented 10 years ago

On Sat, Jul 19, 2014 at 07:35:33AM -0700, Rémi Emonet wrote:

One thing that I like with package managers incl. bower (even though I actually don't like the central repository at all)…

Actually, it looks like Bower can install directly from a Git repo or URL 1. Sorry for spreading misinformation earlier :p. Perhaps Bower already does everything we need then?

twitwi commented 10 years ago

No worries (at least on my side), I was not mislead and knew bower could do it. Still, I guess that "bower search" uses a central repository where package names are "decided" (attributed in a "first-come, first-served" manner), I'd rather have a local repository (like debian (apt)) that could be fed using multiple package lists and/or using a github search API, for github hosted packages).

wking commented 10 years ago

On Sat, Jul 19, 2014 at 11:37:09AM -0700, Rémi Emonet wrote:

Still, I guess that "bower search" uses a central repository where package names are "decided" (attributed in a "first-come, first-served" manner), I'd rather have a local repository (like debian (apt)) that could be fed using multiple package lists and/or using a github search API, for github hosted packages).

I was more concerned about a central registry for installation (which I now realize Bower does not require). It turns out that Bower also supports an ordered list of read-only registries for search 1. I can't think of anything that Bower can't do that we need ;).

SoftwareCarpentryLessonManager / lesson-manager

Existing package managers #2