Open Grokzen opened 2 years ago
As it currently stands #63 expects a format like this:
repos:
- name: pyk
url: git@github.com:Grokzen/pykwalify.git
sparse:
paths:
- "docs"
- "tests"
revision:
branch: master
- name: pykwalify
url: git@github.com:Grokzen/pykwalify.git
sparse:
paths:
- "docs"
- "tests"
clone_point: "pyk/docs"
revision:
tag:
filter: "[0-9].[0-9].[0-9]"
order: semver # This is also the default value if not specified
select: last # Defaults to semver style
However my personal preference would be to use the name
to specify the clone point path instead of introducing a new clone_point
key.
Like this:
repos:
- name: pyk
url: git@github.com:Grokzen/pykwalify.git
sparse:
paths:
- "docs"
- "tests"
revision:
branch: master
- name: "pyk/docs"
url: git@github.com:Grokzen/pykwalify.git
sparse:
paths:
- "docs"
- "tests"
revision:
tag:
filter: "[0-9].[0-9].[0-9]"
order: semver # This is also the default value if not specified
select: last # Defaults to semver style
Good suggestion and clone_point
was renamed into path
instead
But the merged name+clone_point
and adding alias
idea will not be implemented due to that name is used as the input argument from the cli for all actions done by this tool. If we force the user to write something like subgit pull pyk/docs/
instead of just subgit pull pyk
that really makes things complex. Second argument against this is that you mash up the easy to understand names of name & path
with a more difficult to understand pair of name + alias
where name don't mean name and alias is just a worse name for name.
These changes was added into PR #63
To keep on topic, I think that this implementation of path
and subgit pull <subcommand>
shouldn't be either this or that discussion, but both.
In best of worlds, subgit pull [name|path]
should be implemented as a valid commands, both for simplify logic, bash completions or user friendliness.
But for the sake of argument, I'd like to drop the requirement of name
key to have a minimalistic approach to only make key url
into the mandatory field.
name
in that sense should be derived from repository name (ie, .../Grokzen/pykwalify.git
-> pykwalify
), same as path
would.
Example configuration:
repos:
- url: https://github.com/username/repository.git
- url: https://git.local/username/repo.git
This would result in cloning repository.git
into ./repository
and repo.git
into ./repo
And with that change, I would love to see an alias
key to be introduced as a replacement/deprecation for name
, which should populate path
variable if used, but path
should take precedence of alias
when both defined.
For instance:
$ cat .subgit.yml
repos:
- url: git@github.com:Grokzen/pykwalify.git
alias: pyk
- url: git@github.com:Grokzen/pykwalify.git
path: pyk/docs
- url: git@github.com/Grokzen/pykwalify.git
alias: pyk2
path: pyk/tests
$ subgit pull pyk # 'pyk' here is the alias and result in cloning to `./pyk`
$ subgit pull pyk/docs # 'pyk/docs' here is the path and result in cloning to `./pyk/docs`
$ subgit pull pyk2 # 'pyk2' here is the alias, but path take precedence and result in cloning to `./pyk/tests`
In all cases, both alias
and path
entry is valid commands which results in action.
@rholmboe So i think i am a bit onboard on a few of the ideas, however i see some edge cases and corner cases that might be tricky.
So expanding the selector base to cover more cases i can get onboard with. However the UX for this will be both explicit & implicit where some things will be obvious like name/alias and some will be more hidden like the shortform repo name and the username+reponame combo that will just "happen". This is why i am still more in favor of only having name/alias explciitly and no other super magic targeting because it will just confuse and give way to many options. I rather have that you MUST set one name for each repo (i could agree to make it into a list with multiple entries) but to keep it only as explicit.
One cornercase tho is a config like this
$ cat .subgit.yml
repos:
- url: git@gitlab.com:Grokzen/pykwalify.git
- url: git@github.com:Grokzen/pykwalify.git
That would break down in your example with more implicit repo names because you would have two exact entries but one goes to another URL but using the same name scheme. This would mean that we would have to implement a bunch of logic to detect and prevent these things from happening, but also to really try to understand the URL part in all cases and we all know how fun regex is to match against URL:s (thinking of e-mail validation :)
This for me just reinforces the way simpler idea of "you set your own name for each repo" then you must call it gitlab-pykwalify
and github-pykwalify
but we as tools developers only have to check one set of names for duplicates super simple and not care about a hierarchy and order of possible names that should and should not be able to collide and override each other, even if that logic is way cooler :)
The change from name
to alias
is 50/50 for me. It is called a git repo name
not a git repo alias
so i will say that name
should remain. But i do see that it might make more sense with alias
given that it is a free text field
The feature that i will probably never believe would ever be used is to use the folder path as a repo selector. I would need to see a practical use-case from anyone before i would implement that really describing to me why they need it over any other solution simply using cd <folder>
before you run your command kinda.
This would result in cloning repository.git into ./repository and repo.git into ./repo
This part is already a standard within git tool itself. If i simply run git clone git@github.com/Grokzen/pykwalify.git
it will automatically make a folder named pykwalify
itself and clone it in there, we don't have todo or sort that out, it just happens automatically already.
The case where:
repos:
- url: git@gitlab.com:Grokzen/pykwalify.git
- url: git@github.com:Grokzen/pykwalify.git
...would result in a collision is easily handled. In the config file parsing logic, first derive path
and alias
from the url
as @rholmboe proposed. Then loop over all the parsed entries and check if there are duplicate path
or alias
between them, if there are any let the user know that they need to change either in their config file. In this way it gets really explicit and no magic is needed.
If, and only if, we want to get fancy we could implement a suggestion based on smartness. The suggestion could identify that either the domain or the username varies between the two and use that to propose an alternative. I.e. alias: github-pykwalify
vs alias: gitlab-pykwalify
. This suggestion should not be smart behind the scenes, but only be used to suggest a path forward for the user which would of course add the change to their .subgit.yml
themselves.
I am in favor of using both path
and alias
as optional keywords and if not explicitly specified it would be derived from the url
.
This will allow the simplest use cases to have a really short config file. Which is a big bonus.
Only setting alias
(in addition to url
obviously) would get you the ability to set a name that makes sense to you.
Only setting path
(in addition to url
obviously) would allow you to specify a clone point.
Setting them both would combine the two benefits. Some people would probably want to use aliases because it fits their mental model (I'm looking at you @Grokzen) and referring to a path would math other peoples preference (myself included).
But that is one of my points, i don't want to implement super complex hiearchy double, tripple, quad checking everything against all other possibilities in different orders and corner cases. I know it is possible, but it is not worth the time investment over just following KISS, have one selector and do one unique check and nothing else, that is good enough.
These kinds of "smartness" with url parsing and understanding of intent, is way to complicated for almost no value with what we have to invest in time to achive something that would not be usefull for almost no one.
Another fun cornercase just to throw it into the mix even further that we then have to "suppor" with understanding is that git supports cloning from other local folders so in the url
field it will understand for instance file:///path/to/repo.git/
that we then would have to also magically "support" with understanding, parse out and magically sort out to make it compatible.
So the simplest and most easy thing to implement is to have one name/alias field, one value there and only that one. No other fancy magic as it will make things to complicated for a feature i doubt many or anyone will really use in the end
Thinking of the bash completion nicety. I propose this:
Implement two new subcommands to list them. Something like this:
$ subgit path
pyk > git@github.com:Grokzen/pykwalify.git
pyk/docs > git@github.com:Grokzen/pykwalify.git /docs
pyk/tests > git@github.com:Grokzen/pykwalify.git /tests
$ subgit alias
pyk > git@github.com:Grokzen/pykwalify.git
pyk/docs > git@github.com:Grokzen/pykwalify.git /docs
pyk2 > git@github.com:Grokzen/pykwalify.git /tests
We will of course focus on SSH and possibly HTTP URL:s. If we detect a file:///
url
(using urllib2 to parse the url) we should let the user know. I don't see the problem with this.
Yes it needs code but it isn't unfeasible with rather OK amount of dev time.
This is a rewrite of #2 with some updates
Right now we only have the ability to clone into the same folder. But we want to be able to clone into a sub-folder if possible.
Extend the configuration for a repo to allow for the option
clone-path: <folder-path>
for any repo we defineIf this is set then we should attempt to clone the git repo into this folder. But there is two different cases here.
The first is that we might want to clone one git repo into the already cloned repo of another repo. So lets say we have the repo
pykwalify/
that we already have cloned to disk. We want to clone another reporedis-py-cluster
into the folder pathpykwalify/redis-cluster/
instead of the root folder we stand in. The problem here is that we must have some kind of dependency resolution where we must say what order repos should be processed in to allow for this as we can't create a random folder, cloneredis-py-cluster
into it first, then clone and overwritepykwalify/
on the same folder. Add ontop of this that our configuration file is a nested dict that do not guarantee or preserv insert order always so we can't guarantee that we will always process pykwalify before redis-py-cluster for instance.If we go down dependency order lane, keep it as simple as possible, depnedencies must be defined where one repo "depends on" another repo, the tool will just re-order the execution order to fullfill the dependencies, then execute the commands in that order.
The second alternative we have is that we just ignore any dependencies and we hope that subgit just executes things in the correct order. This might cause issues and problems if we only let this to chanse. The upside is that this is way simpler to implement and we might just impelment the feature and add in dependency handling later on to move the tool forward.
Non-allowed characters or paths should be /../ anywhere in the path, working outside initialized folder shouldn't be allowed.
We also might need to sort out how we deal with moving an already exported folder if we move things around. This might mean that we need to implement state tracking of already done commands. This will need to be investigated further.