dynamist / subgit

Sub-git repo handler
Apache License 2.0
0 stars 0 forks source link

Allow for defining clone point #33

Open Grokzen opened 2 years ago

Grokzen commented 2 years ago

This is a rewrite of #2 with some updates

Right now we only have the ability to clone into the same folder. But we want to be able to clone into a sub-folder if possible.

Extend the configuration for a repo to allow for the option clone-path: <folder-path> for any repo we define

If this is set then we should attempt to clone the git repo into this folder. But there is two different cases here.

The first is that we might want to clone one git repo into the already cloned repo of another repo. So lets say we have the repo pykwalify/ that we already have cloned to disk. We want to clone another repo redis-py-cluster into the folder path pykwalify/redis-cluster/ instead of the root folder we stand in. The problem here is that we must have some kind of dependency resolution where we must say what order repos should be processed in to allow for this as we can't create a random folder, clone redis-py-cluster into it first, then clone and overwrite pykwalify/ on the same folder. Add ontop of this that our configuration file is a nested dict that do not guarantee or preserv insert order always so we can't guarantee that we will always process pykwalify before redis-py-cluster for instance.

If we go down dependency order lane, keep it as simple as possible, depnedencies must be defined where one repo "depends on" another repo, the tool will just re-order the execution order to fullfill the dependencies, then execute the commands in that order.

The second alternative we have is that we just ignore any dependencies and we hope that subgit just executes things in the correct order. This might cause issues and problems if we only let this to chanse. The upside is that this is way simpler to implement and we might just impelment the feature and add in dependency handling later on to move the tool forward.

Non-allowed characters or paths should be /../ anywhere in the path, working outside initialized folder shouldn't be allowed.

We also might need to sort out how we deal with moving an already exported folder if we move things around. This might mean that we need to implement state tracking of already done commands. This will need to be investigated further.

holmboe commented 1 year ago

As it currently stands #63 expects a format like this:

repos:
  - name: pyk
    url: git@github.com:Grokzen/pykwalify.git
    sparse:
      paths:
        - "docs"
        - "tests"
    revision:
      branch: master
  - name: pykwalify
    url: git@github.com:Grokzen/pykwalify.git
    sparse:
      paths:
        - "docs"
        - "tests"
    clone_point: "pyk/docs"
    revision:
      tag:
        filter: "[0-9].[0-9].[0-9]"
        order: semver  # This is also the default value if not specified
        select: last  # Defaults to semver style

However my personal preference would be to use the name to specify the clone point path instead of introducing a new clone_point key.

Like this:

repos:
  - name: pyk
    url: git@github.com:Grokzen/pykwalify.git
    sparse:
      paths:
        - "docs"
        - "tests"
    revision:
      branch: master
  - name: "pyk/docs"
    url: git@github.com:Grokzen/pykwalify.git
    sparse:
      paths:
        - "docs"
        - "tests"
    revision:
      tag:
        filter: "[0-9].[0-9].[0-9]"
        order: semver  # This is also the default value if not specified
        select: last  # Defaults to semver style
Grokzen commented 1 year ago

Good suggestion and clone_point was renamed into path instead

But the merged name+clone_point and adding alias idea will not be implemented due to that name is used as the input argument from the cli for all actions done by this tool. If we force the user to write something like subgit pull pyk/docs/ instead of just subgit pull pyk that really makes things complex. Second argument against this is that you mash up the easy to understand names of name & path with a more difficult to understand pair of name + alias where name don't mean name and alias is just a worse name for name.

These changes was added into PR #63

rholmboe commented 1 year ago

To keep on topic, I think that this implementation of path and subgit pull <subcommand> shouldn't be either this or that discussion, but both.

In best of worlds, subgit pull [name|path] should be implemented as a valid commands, both for simplify logic, bash completions or user friendliness.


But for the sake of argument, I'd like to drop the requirement of name key to have a minimalistic approach to only make key url into the mandatory field.

name in that sense should be derived from repository name (ie, .../Grokzen/pykwalify.git -> pykwalify), same as path would.

Example configuration:

repos:
  - url: https://github.com/username/repository.git
  - url: https://git.local/username/repo.git

This would result in cloning repository.git into ./repository and repo.git into ./repo

And with that change, I would love to see an alias key to be introduced as a replacement/deprecation for name, which should populate path variable if used, but path should take precedence of alias when both defined.

For instance:

$ cat .subgit.yml
repos:
  - url: git@github.com:Grokzen/pykwalify.git
    alias: pyk
  - url: git@github.com:Grokzen/pykwalify.git
    path: pyk/docs
  - url: git@github.com/Grokzen/pykwalify.git
    alias: pyk2
    path: pyk/tests

$ subgit pull pyk       # 'pyk' here is the alias and result in cloning to `./pyk`
$ subgit pull pyk/docs  # 'pyk/docs' here is the path and result in cloning to `./pyk/docs`
$ subgit pull pyk2      # 'pyk2' here is the alias, but path take precedence and result in cloning to `./pyk/tests`

In all cases, both alias and path entry is valid commands which results in action.

Grokzen commented 1 year ago

@rholmboe So i think i am a bit onboard on a few of the ideas, however i see some edge cases and corner cases that might be tricky.

So expanding the selector base to cover more cases i can get onboard with. However the UX for this will be both explicit & implicit where some things will be obvious like name/alias and some will be more hidden like the shortform repo name and the username+reponame combo that will just "happen". This is why i am still more in favor of only having name/alias explciitly and no other super magic targeting because it will just confuse and give way to many options. I rather have that you MUST set one name for each repo (i could agree to make it into a list with multiple entries) but to keep it only as explicit.

One cornercase tho is a config like this

$ cat .subgit.yml
repos:
  - url: git@gitlab.com:Grokzen/pykwalify.git
  - url: git@github.com:Grokzen/pykwalify.git

That would break down in your example with more implicit repo names because you would have two exact entries but one goes to another URL but using the same name scheme. This would mean that we would have to implement a bunch of logic to detect and prevent these things from happening, but also to really try to understand the URL part in all cases and we all know how fun regex is to match against URL:s (thinking of e-mail validation :)

This for me just reinforces the way simpler idea of "you set your own name for each repo" then you must call it gitlab-pykwalify and github-pykwalify but we as tools developers only have to check one set of names for duplicates super simple and not care about a hierarchy and order of possible names that should and should not be able to collide and override each other, even if that logic is way cooler :)

The change from name to alias is 50/50 for me. It is called a git repo name not a git repo alias so i will say that name should remain. But i do see that it might make more sense with alias given that it is a free text field

The feature that i will probably never believe would ever be used is to use the folder path as a repo selector. I would need to see a practical use-case from anyone before i would implement that really describing to me why they need it over any other solution simply using cd <folder> before you run your command kinda.

This would result in cloning repository.git into ./repository and repo.git into ./repo

This part is already a standard within git tool itself. If i simply run git clone git@github.com/Grokzen/pykwalify.git it will automatically make a folder named pykwalify itself and clone it in there, we don't have todo or sort that out, it just happens automatically already.

holmboe commented 1 year ago

The case where:

repos:
  - url: git@gitlab.com:Grokzen/pykwalify.git
  - url: git@github.com:Grokzen/pykwalify.git

...would result in a collision is easily handled. In the config file parsing logic, first derive path and alias from the url as @rholmboe proposed. Then loop over all the parsed entries and check if there are duplicate path or alias between them, if there are any let the user know that they need to change either in their config file. In this way it gets really explicit and no magic is needed.

If, and only if, we want to get fancy we could implement a suggestion based on smartness. The suggestion could identify that either the domain or the username varies between the two and use that to propose an alternative. I.e. alias: github-pykwalify vs alias: gitlab-pykwalify. This suggestion should not be smart behind the scenes, but only be used to suggest a path forward for the user which would of course add the change to their .subgit.yml themselves.

holmboe commented 1 year ago

I am in favor of using both path and alias as optional keywords and if not explicitly specified it would be derived from the url.

This will allow the simplest use cases to have a really short config file. Which is a big bonus.

Only setting alias (in addition to url obviously) would get you the ability to set a name that makes sense to you.

Only setting path (in addition to url obviously) would allow you to specify a clone point.

Setting them both would combine the two benefits. Some people would probably want to use aliases because it fits their mental model (I'm looking at you @Grokzen) and referring to a path would math other peoples preference (myself included).

Grokzen commented 1 year ago

But that is one of my points, i don't want to implement super complex hiearchy double, tripple, quad checking everything against all other possibilities in different orders and corner cases. I know it is possible, but it is not worth the time investment over just following KISS, have one selector and do one unique check and nothing else, that is good enough.

These kinds of "smartness" with url parsing and understanding of intent, is way to complicated for almost no value with what we have to invest in time to achive something that would not be usefull for almost no one.

Another fun cornercase just to throw it into the mix even further that we then have to "suppor" with understanding is that git supports cloning from other local folders so in the url field it will understand for instance file:///path/to/repo.git/ that we then would have to also magically "support" with understanding, parse out and magically sort out to make it compatible.

So the simplest and most easy thing to implement is to have one name/alias field, one value there and only that one. No other fancy magic as it will make things to complicated for a feature i doubt many or anyone will really use in the end

holmboe commented 1 year ago

Thinking of the bash completion nicety. I propose this:

Implement two new subcommands to list them. Something like this:

$ subgit path
pyk > git@github.com:Grokzen/pykwalify.git
pyk/docs > git@github.com:Grokzen/pykwalify.git /docs
pyk/tests > git@github.com:Grokzen/pykwalify.git /tests
$ subgit alias
pyk > git@github.com:Grokzen/pykwalify.git
pyk/docs > git@github.com:Grokzen/pykwalify.git /docs
pyk2 > git@github.com:Grokzen/pykwalify.git /tests
holmboe commented 1 year ago

We will of course focus on SSH and possibly HTTP URL:s. If we detect a file:/// url (using urllib2 to parse the url) we should let the user know. I don't see the problem with this.

Yes it needs code but it isn't unfeasible with rather OK amount of dev time.