gitgitgadget / git

GitGitGadget's Git fork. Open Pull Requests here to submit them to the Git mailing list
https://gitgitgadget.github.io/
Other
217 stars 134 forks source link

Finish the conversion of `git submodule` to a built-in #541

Open dscho opened 4 years ago

dscho commented 4 years ago

The git-submodule.sh script weighs in with just over 1,000 lines. That is a rather tall order, so most likely it will take the equivalent of multiple Google Summer of Code projects to finish.

Taking a bit of a deeper look into the git-submodule.sh script, we see a peculiar pattern in some of the subcommands, e.g. in cmd_foreach: https://github.com/git/git/blob/v2.21.0/git-submodule.sh#L320-L349

Essentially, it spends two handfuls of lines on option parsing, and then the real business logic is performed by the submodule--helper, which is already a built-in.

Most of that business logic is implemented in submodule.c.

The important part is to identify those commands in git-submodule.sh that do not yet follow this pattern "parse options then hand off to submodule--helper". See more on that below.

The commit history of the commands that do use the submodule--helper will help see how they were converted, what conventions were used, whether there were recurring patterns, etc. This style should be imitated for the rest of the conversion.

An important thing to realize is that it is not so much a conversion as a re-implementation, and the best way to go about it is incremental.

Take for example the git submodule init subcommand. It has been re-implemented in C a long time ago, and the shell script version merely parses the command-line arguments and then passes them on to the built-in git submodule--helper command:

https://github.com/git/git/blob/v2.24.1/git-submodule.sh#L361-L390

It probably would make sense to study the initial re-implementation (or "port"): 3604242f080a813d6f20a7394def422d1e55b30e

It appears that cmd_add, cmd_init, cmd_deinit, (partially) cmd_update, (partially) cmd_set_branch and cmd_summary still need to be ported while cmd_foreach, cmd_status, cmd_sync and cmd_absorbgitdirs already use the submodule--helper-provided built-in code.

The easiest of the sub-commands that still need to be ported might be cmd_set_branch. In the git.git fork of the contributor who started the porting effort, there are some branches that might or might not have useful commits in them, but given that set_branch already calls the submodule--helper a couple of times and does little on its own, it might make sense to just do that port "from scratch".

The idea here will be to pass the options using the construct ${GIT_QUIET:+--quiet} (which expands to --quiet if $GIT_QUIET would expand to a non-empty string) to the submodule--helper, and to add a new sub-command to said helper, by implementing static int set_branch(int argc, const char **argv, const char *prefix) in builtin/submodule--helper.c and appending the corresponding entry to the commands array.

Likewise, it should be easy to figure out which functions are used by the submodule--helper to perform the actions asked for by the current shell script function cmd_set_branch, and call them directly from the newly-implemented set_branch() function in builtin/submodule--helper.c.

It is highly recommended to contribute this work in small chunks, e.g. as soon as cmd_set_branch is done, even if it is not working yet (in which case it should be marked as request for help). The hardest part, as with every project, is to get started, from there on it will be easier.

periperidip commented 4 years ago

Most of that business logic is implemented in submodule.c.`

How does one distinguish between which function to add in the aforementioned file and in submodule--helper.c?

It appears that cmd_add, cmd_init, cmd_deinit, (partially) cmd_update, (partially) cmd_set_branch and cmd_summary still need to be ported while cmd_foreach, cmd_status, cmd_sync and cmd_absorbgitdirs already use the submodule--helper-provided built-in code.

I think that cmd_init and cmd_deinit have been converted(unless some partial conversion is left). Those left for conversion are: cmd_summary, cmd_set_branch, cmd_set_url and cmd_add. Also, if cmd_update is partially converted then what exactly is left to convert in the subcommand? On observation of code in submodule--helper.c, it seems that set-branch and set-url have many helpers created already as part of other commands, hence they may be prove to be useful at the end.

It is highly recommended to contribute this work in small chunks, e.g. as soon as cmd_set_branch is done, even if it is not working yet (in which case it should be marked as request for help). The hardest part, as with every project, is to get started, from there on it will be easier.

I have started work on cmd_summary() and have implemented the frontend function module_summary() as of now. I aim to implement big chunks of the subcommand in the coming days.

dscho commented 4 years ago

Most of that business logic is implemented in submodule.c.`

How does one distinguish between which function to add in the aforementioned file and in submodule--helper.c?

My personal preference would be to put reusable bits and pieces into submodule.c and transient parts that are only needed to support the shell script version into submodule--helper.c.

I think that cmd_init and cmd_deinit have been converted(unless some partial conversion is left).

I guess as of current master you're right.

Those left for conversion are: cmd_summary, cmd_set_branch, cmd_set_url

What about cmd_add?

if cmd_update is partially converted then what exactly is left to convert in the subcommand?

Well, look at this feast: https://github.com/git/git/blob/v2.26.0/git-submodule.sh#L451-L712

What's with that outer loop, and with all of that business logic inside that loop? That needs to be implemented in C eventually, and therefore I would not consider cmd_update to be converted.

I have started work on cmd_summary() and have implemented the frontend function module_summary() as of now. I aim to implement big chunks of the subcommand in the coming days.

:+1:

periperidip commented 4 years ago

What about cmd_add?

Totally forgot about that! Editing my comment to include it

Well, look at this feast: https://github.com/git/git/blob/v2.26.0/git-submodule.sh#L451-L712

Woah, that was totally unexpected! I will have to check into it to see what is going on here. I don't think there is any possibility that the command was converted and someone forgot to "erase" this part by mistake.

pratikghule commented 3 years ago

hey I wanted to contribute to this issue and help in conversion of git submodule to a built-in in C lanuage. How should i begin?

phil-blain commented 3 years ago

Hi @pratikghule !

The last remaining git submodule subcommands that are implemented in shell are git submodule add and git submodule update.

Work has already started to convert git submodule add, but it seems to have stalled; see the following patches and messages on the Git mailing list:

So if you want to help with that you can get in contact with @periperidip (I suggest doing so directly on the list) to see if you can help pushing that patch series to the finish line.


Regarding git submodule update: this is a fundamental command for the submodule functionality so I suggest that before undertaking that conversion, you should get a very thorough understanding of how submodules work.

Submodules are a more advanced feature of Git that even experienced Git users might not have encountered. Here are some reading materials, in increasing level of complexity, about submodules:

phil-blain commented 2 years ago

Small update:

So the remaining git-submodule.sh script is really just option parsing now (and some preparatory setup that may or may not be needed anymore).