Feature Request: Support for multiple remotes

dsager commented 9 years ago

It would be great if GWS supported multiple remotes instead of the two hard-coded defaults origin and upstream.

I often happen to work with additional remotes (e.g. heroku or octopress) and also have a different naming scheme in some repositories (e.g. mine and origin instead of origin and upstream). These are ignored by gws init and I don't see any way to configure it manually.

If this is considered a sensible feature request, I'd be glad to help with the implementation! Any thoughts on this? For example I'm not sure how this could be implemented in a proper way without messing up the current file format...

StreakyCobra commented 9 years ago

I was afraid of seeing this request arriving. I started the script without thinking of sharing it, so at the beginning it was just the folder | url syntax. Then I extended it because I was needing upstream urls for some projects: it became folder | url | upstream. I noticed this was a bad solution but, who cares, it's just a script for myself. So when I started sharing it, I knew this request would arrive one day.

Up to my knowledge there will be four problems:

The file format and its parsing
Working with remotes internally
Displaying all the informations
Speed concerns

My thoughts

The file format and its parsing The most complicated part. I choosed a really simple file format to simplify the parsing, because bash is not really a pleasant language to work with. Keeping a 1-line based configuration file is not really a good solution if we want to allow more than two remotes that can be named. So a new file format will be needed, which is not really a problem, but the parser that comes with is.

Working with remotes internally Hum it will not be too difficult for this, there is currently one function that deals with this: git_check_branch_origin and it can easily be adapted to handle the remote in argument.

Displaying all the informations Will the informations be displayed? Or is it just to setup the remotes correctly? If it is just to setup the remotes, we can imagine to have a format like folder | url | name | name_url | name2 | name2_url [...] which permit a really simple configuration file for standard users, and allows advanced users to set more remotes. If the goal is to display informations, like which branches are synced and which not, we need to rethink the display. And I have already to scroll for seing all informations of my repositories, could be boring to have too much informations.

Also I must say that the display part is not the best written in this script

Speed concerns

I have near 40 repositories in my config, and it takes nearly 600 miliseconds on a decent computer to show the status. On my home computer it is even more. Complicating the config format means more parsing, what will increase the global time needed for all operations.

Conclusion

This enhancement could be a nice feature, but it implies big changes that also come with some drawbacks. My humble opinion is that bash is too complicated to do complex parsing and too slow to do complicate analysis. For me, gws will stay a really interesting portable solution which "just works ©", but will never get really "big" because of bash, sadly.

It is also why I'm currently starting to write an evolution of this software in a more maintainable programming language: I want to have more powerful config files, more speed, more options, etc.

For all those reasons I'm not gonna to implement this myself in gws, but if someone arrive with some acceptable solution, I'd be happy to include it.

dsager commented 9 years ago

Thanks for your answer and reasoning, @StreakyCobra. I didn't get (or missed) the notification about the comment, so sorry for the late reply :) I agree that bash makes things a bit complicated and can understand that you want to keep the whole thing simple. Like it is you can run it out of the box without installing any dependencies...

Soonish I will get a new computer and will have to migrate, that might be a good time to look into this :)

What would you think of the following format:

FOLDER | URL_1 [NAME_1] [ | URL_2 [NAME_2] ] [ | URL_n [NAME_n] ]

The separator between URL and NAME is a whitespace (or any other char). NAME_1 and NAME_2 would default to origin and upstream respectively. Like this the old format would still work but the user can change the names if he wants. And at the end of the line he can add as many additional URLs as he wants. That the run time increases as you keep adding remotes should be obvious :)

StreakyCobra commented 9 years ago

No problem, I wasn't waiting on the answer anyway! And thank you for your interest and time!

The syntax you propose looks nice and keeps backward compatibility in an elegant way :-) . So about my 4 concerns:

File format and parsing: This syntax is simple and must be easily parsable
Working with remotes internally: Already implemented
Speed concerns: With this format there must not be a lot of differences for parsing

The last point to think about is displaying information. Currently there is 2 levels, one for the repositories, and the second one for the branches. Do you already have an idea about the multiple remotes? Doing like now with upstream and ignoring all remotes except origin? Or adding multiple third levels for showing the status of each branch of each remote of each repository?

For me it makes more sense to only display and check origin, because otherwise we will have to deal with other parts, like the return code of status that says if everything is synchronized.

dsager commented 9 years ago

As I'm mostly interested in backup and restore I'd be totally fine with the status command only checking origin. Two possible enhancements:

Allow to run the status for a single repository only, this will check all remotes and branches
Add a flag for full status info (all remotes and branches), e.g. gws status --all-remotes

But like I said, I'm fine with status only looking at origin. For more detailed info I would use git directly. The way I see it, gws should help you to do simple bulk operations on multiple repos (create .gws file, init repos, simple status) and not create detailed reports :)

StreakyCobra commented 9 years ago

The development version 0.1.8 already allows to run a subcommand on a subset of repositories. I don't like the idea to have a difference if the command is run for one repository or several. But sure adding a flag for showing the full status is a good solution.

I'm also fine with just checking against origin, because it is my usage. So let's wait for other people explicitly asking for it before starting to implement an unused feature ;-)

dsager commented 9 years ago

I just played around a little with the line parsing and came up with what you can see in the following gist:

https://gist.github.com/dsager/00cad170e0e752a3ca27

Obviously it's missing the default values origin and upstream and is not 100% compatible with your current code (using the cut command), but it might serve as a starting point...

What do you think?

StreakyCobra commented 9 years ago

I'm not specially attached to use cut. I use it because it was the obvious one for this purpose :-)

Why not using something like:

# We get the directory
DIR=$(cut -d${FIELD_SEP} -f1 <<< "$ROW" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')
# We get the rest of the configuration line containing remotes
REMOTES=$(cut -d${FIELD_SEP} -f1 --complement <<< "$ROW")

to get the directory and row instead of lines [6-10]?

Same remark for lines [15-18]:

# To be defined at the top of `gws`
URL_NAME_SEP=' '
# We get the first defined remote of the line
REMOTE=$(cut -d${FIELD_SEP} -f1 <<< "$REMOTES" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')
# We remove the current remote from the line for next iteration
REMOTES=$(cut -d${FIELD_SEP} -f1 --complement <<< "$REMOTES")
# We get its url
REMOTE_URL=$(cut -d${URL_NAME_SEP} -f1 <<< "$REMOTE" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')
# We get its name if any
REMOTE_NAME=$(cut -d${URL_NAME_SEP} -f2 -s <<< "$REMOTE")
# We can check if $REMOTE_NAME is empty, and if it is the case associate "origin" or "upstream", or throw an error
# ...

I like using good variable names, like REMOTES is probably better than ROW as I wrote it. But that's cosmetic.

dsager commented 9 years ago

hey, sorry for the late reply again :)

Yes, your approach works fine as well I guess. I just liked the idea of using as little external tools as possible (${string%%substring} syntax vs cut). But on the other hand cut should be available on any *IX system I guess :)

StreakyCobra commented 9 years ago

Hi,

hey, sorry for the late reply again :)

No problem!

But on the other hand cut should be available on any *IX system I guess :)

Yes, cut is part of coreutils, and they are supposed to be installed everywhere: «These are the core utilities which are expected to exist on every operating system.»

I just liked the idea of using as little external tools as possible (${string%%substring} syntax vs cut)

You are right about having the smallest possible set of dependencies, but there is also a readability counterpart, which is my main concern here. If someone needs to understand or maintain this part in 6 months, it would be far easier to understand what is doing the cut command — it is its main purpose — than some not-so-common bash syntax. There is a trade-off between readability and dependencies, and here I would prefer readability :-)

StreakyCobra commented 9 years ago

The develop branch now contains a proposal solution tackling this issue. The file format is:

FOLDER | URL_1 [NAME_1] [ | URL_2 [NAME_2] [ | URL_n NAME_n ... ] ]

If NAME_1 is not specified, it is assumed to be origin if NAME_2 is not specified, it is assumed to be upstream At least one URL must be associated with the name origin

Here are some points regarding my previous concerns:

This file format keep backward compatibility
The parsing speed is not any longer a problem because I've implemented a cache system (parsing is done only if one of .project.gws or .ignore.gws are modified)
The only information displayed are status in relation to origin
The update command now create missing remotes (but don't modify existing ones)
The init command is modified accordingly to create .project.gws with extra remotes

I tried a few cases and it seems to work. Can you try it too? If it works for you I'm planning to release the 0.8.0 version as there is already a few new features.

dsager commented 9 years ago

that's awesome, I'll give a try later on and let you know!

dsager commented 9 years ago

It seems to work just fine, at least the gws update. Thanks a lot! I'm closing this issue!

StreakyCobra / gws

Feature Request: Support for multiple remotes #12

My thoughts

Conclusion