FreshPorts / git_proc_commit

Tools for processing git commits one at a time.
BSD 2-Clause "Simplified" License
0 stars 1 forks source link

Helper script for redoing one commit #7

Closed dlangille closed 4 years ago

dlangille commented 4 years ago

Given 24e0896aa09051022ef1aacc6776bb4f34312a65, I want to regenerate the XML file and resubmit it for processing.

dlangille commented 4 years ago

I created a new branch: single commit

I want git-to-freshports-xml.py to be invoked to do a range or to do a single commit

@skozlov404 You're better at Python than I am. Is my objective clear?

skozlov404 commented 4 years ago

@skozlov404 You're better at Python than I am. Is my objective clear?

Yes, I think so. I'll take a look of what I can do about it in the next few days.

dlangille commented 4 years ago

I created this helper script:

$ cat ~/scripts/helper_scripts/git-commit-single.sh
#!/bin/sh

COMMIT=$1

cd ~freshports/ports-jail/var/db/repos/PORTS-head-git/
echo git checkout master      | sudo su -fm freshports
echo git reset --hard $COMMIT | sudo su -fm freshports

cd /usr/local/libexec/freshports/

echo /usr/local/libexec/freshports/git-to-freshports-xml.py --path /var/db/freshports/ports-jail/var/db/repos/PORTS-head-git --single-commit $COMMIT --output /var/db/freshports/message-queues/incoming | sudo su -fm freshports

I had to goto master, then checkout the one commit. I was trying to solve this issue:

$ ~/scripts/helper_scripts/git-commit-single.sh 24e0896aa09051022ef1aacc6776bb4f34312a65
Updating files: 100% (48550/48550), done.
HEAD is now at 24e0896aa090 math/maxima: Update to 5.44.0
Traceback (most recent call last):
  File "/usr/local/libexec/freshports/git-to-freshports-xml.py", line 163, in <module>
    main()
  File "/usr/local/libexec/freshports/git-to-freshports-xml.py", line 123, in main
    ET.SubElement(update, 'OS', Repo=config['repo'], Id=config['os'], Branch=str(repo.active_branch))
  File "/usr/local/lib/python3.7/site-packages/git/repo/base.py", line 696, in active_branch
    return self.head.reference
  File "/usr/local/lib/python3.7/site-packages/git/refs/symbolic.py", line 275, in _get_reference
    raise TypeError("%s is a detached symbolic reference as it points to %r" % (self, sha))
TypeError: HEAD is a detached symbolic reference as it points to '24e0896aa09051022ef1aacc6776bb4f34312a65

Seems to work as is now.

dlangille commented 4 years ago

It happened again tonight. There is something else I have to do to reset the repo between each run.

[dan@devgit-ingress01:~/scripts] $ echo /usr/local/libexec/freshports/git-delta.sh | sudo su -fm freshports
2020.07.08 22:45:22 git-delta.sh started
2020.07.08 22:45:22 git-delta.sh repo is /var/db/freshports/ports-jail/var/db/repos/PORTS-head-git
2020.07.08 22:45:22 git-delta.sh XML dir is /var/db/freshports/message-queues/incoming
2020.07.08 22:45:22 git-delta.sh running: /usr/local/bin/git fetch origin
remote: Enumerating objects: 1744, done.
remote: Counting objects: 100% (1744/1744), done.
remote: Compressing objects: 100% (763/763), done.
remote: Total 1882 (delta 992), reused 1723 (delta 971), pack-reused 138
Receiving objects: 100% (1882/1882), 720.94 KiB | 6.93 MiB/s, done.
Resolving deltas: 100% (1001/1001), completed with 312 local objects.
From https://github.com/freebsd/freebsd-ports
   3ea9051165d7..5811beec8423  master          -> origin/master
   41218bd95a62..4df9b768da2b  branches/2020Q3 -> origin/branches/2020Q3
   4a02e95ca6d8..618ecb46a899  svn_head        -> origin/svn_head
2020.07.08 22:45:26 git-delta.sh running: /usr/local/bin/git reset --hard HEAD
HEAD is now at f2bfe60090b8 net-mgmt/unifi5: Update to 5.11.46
2020.07.08 22:45:34 git-delta.sh running: /usr/local/bin/git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 28157 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
2020.07.08 22:45:35 git-delta.sh STARTPOINT = ab24c4bd5dff
2020.07.08 22:45:35 git-delta.sh running; /usr/local/bin/git rebase origin/master
Successfully rebased and updated refs/heads/master.
2020.07.08 22:45:48 git-delta.sh running: /usr/local/bin/git rev-list ab24c4bd5dff..HEAD
... whole bunch of hashes not shown ...
36db40c11698a9 0d5f1c8c72d95bc46329694b2490099765002331
2020.07.08 22:45:49 git-delta.sh /usr/local/libexec/freshports/git-to-freshports-xml.py --path /var/db/freshports/ports-jail/var/db/repos/PORTS-head-git --commit ab24c4bd5dff --output /var/db/freshports/message-queues/incoming
Traceback (most recent call last):
  File "/usr/local/libexec/freshports/git-to-freshports-xml.py", line 163, in <module>
    main()
  File "/usr/local/libexec/freshports/git-to-freshports-xml.py", line 123, in main
    ET.SubElement(update, 'OS', Repo=config['repo'], Id=config['os'], Branch=str(repo.active_branch))
  File "/usr/local/lib/python3.7/site-packages/git/repo/base.py", line 696, in active_branch
    return self.head.reference
  File "/usr/local/lib/python3.7/site-packages/git/refs/symbolic.py", line 275, in _get_reference
    raise TypeError("%s is a detached symbolic reference as it points to %r" % (self, sha))
TypeError: HEAD is a detached symbolic reference as it points to 'f2bfe60090b840b6d99a3288c0b745843cefcfe1'
2020.07.08 22:46:02 git-delta.sh ending
[dan@devgit-ingress01:~/scripts] $ 
skozlov404 commented 4 years ago

echo git reset --hard $COMMIT

Why do you do this in your script? It seems to me that can cause the problem you're seeing.

dlangille commented 4 years ago

It was an idea from #3 and thought it would help. I'll try without.

dlangille commented 4 years ago

Oh, that's in helper_scripts/git-commit-single.sh - I was running git-delta.sh which does a git reset --hard HEAD --- based on #3 suggestions.

skozlov404 commented 4 years ago

I'd say it doesn't even matter if the tree is dirty or not - the script processes the commits that already happened, so git reset shouldn't be needed at all. The only thing that matters is that you do git fetch origin beforehand so your tree is up to date

skozlov404 commented 4 years ago

Oh, I get what's been suggested in #3 - since git-to-freshports-xml.py used to only process all the commits from the specified one to the HEAD - if we move the HEAD to right above the commit we're specifying - this would give us the effect of processing the single commit.

Thing is, with --single-commit and --commit-range flags now implemented - we don't need to jump around the git tree anymore like that - by just using git fetch origin and the proper flags to git-to-freshports-xml.py we're now able to achieve everything required.

dlangille commented 4 years ago

This is what I have now:

[dan@devgit-ingress01:~/scripts] $ grep GIT git-delta.sh
${GIT} fetch $REMOTE
${GIT} checkout master
STARTPOINT=$(${GIT} log master..$REMOTE/master --oneline --reverse | head -n 1 | cut -d' ' -f1)
${GIT} rebase $REMOTE/master

I removed logging etc from the above.

dlangille commented 4 years ago

Hey @sarcasticadmin I thought you might have ideas here since this area of work was originally your suggestion.

One issue I keep thinking about: getting out of sync. We want to make sure the latest commit in our repo is the latest commit in the database. If it is not, we need to process what's in the repo before doing a fetch.

I don't think that'll be difficult. Just a thing to be done.

sarcasticadmin commented 4 years ago

@dlangille thanks, Ill take a look at this tonight and give you some feedback. At the moment Im not near my machine.

sarcasticadmin commented 4 years ago

@skozlov404

I'd say it doesn't even matter if the tree is dirty or not - the script processes the commits that already happened, so git reset shouldn't be needed at all. The only thing that matters is that you do git fetch origin beforehand so your tree is up to date

Oh, I get what's been suggested in #3 - since git-to-freshports-xml.py used to only process all the commits from the specified one to the HEAD - if we move the HEAD to right above the commit we're specifying - this would give us the effect of processing the single commit.

Thing is, with --single-commit and --commit-range flags now implemented - we don't need to jump around the git tree anymore like that - by just using git fetch origin and the proper flags to git-to-freshports-xml.py we're now able to achieve everything required.

The idea of reset --hard HEAD was just to make sure for any reason that the local master was clean so that we could git rebase since the rebase needs to have a clean tree before it can proceed.

git-delta.sh was assuming that git-to-freshports-xml.py just needed a starting point, so I figured we could leverage the remote local master branch compared to the local master from when we last synced to get our starting point and just pass that along to git-to-freshports-xml.py to process the commits from the starting point to HEAD.

We arent jumping around the tree, just leveraging the lag from the last time we synced and then just fetching last copy of master and comparing it to local master and bring local master up to date.

@dlangille it looks like your error is just a TypeError due to how that git library processes the commits. Pulling from your output above https://github.com/FreshPorts/git_proc_commit/issues/7#issuecomment-655797104:

TypeError: HEAD is a detached symbolic reference as it points to 'f2bfe60090b840b6d99a3288c0b745843cefcfe1'

It would seem that we need to handle HEAD in a special way (python type is different) or just get the commit hash that HEAD currently points to so we can make the ranging in git-to-freshports-xml.py work correctly.

One issue I keep thinking about: getting out of sync. We want to make sure the latest commit in our repo is the latest commit in the database. If it is not, we need to process what's in the repo before doing a fetch.

In regards to this, I think this is a separate issue compared to whats being discussed here. I would think just checking the database for the last commit stored then comparing that to the starting point in git-delta.sh. The starting point should just be 1 commit ahead of whats in the database, if not move the starting point to the the commit right after the database entry and let git-to-freshports-xml.py process it all. We are guaranteed that the history will be in the right order as a no one in the portstree is rewriting history on master (this is not the case)

dlangille commented 4 years ago

An aside: the current approach is processing only commits on master.

We have the 2020Q3 branch to consider as well... the code isn't catering to that yet.

dlangille commented 4 years ago

TypeError: HEAD is a detached symbolic reference as it points to 'f2bfe60090b840b6d99a3288c0b745843cefcfe1'

It would seem that we need to handle HEAD in a special way (python type is different) or just get the commit hash that HEAD currently points to so we can make the ranging in git-to-freshports-xml.py work correctly.

This came up today when trying to process a single commit. I think #14 will fix this long term.

[freshports@devgit-ingress01 /usr/local/libexec/freshports]$ ./git-single-commit.sh c5f3b87db914aacea80f6e55223c246aa2e14d43
2020.07.15 14:57:57 git-single-commit.sh started
2020.07.15 14:57:57 git-single-commit.sh repo is
2020.07.15 14:57:57 git-single-commit.sh XML dir is /var/db/freshports/message-queues/incoming
2020.07.15 14:57:57 git-single-commit.sh /usr/local/libexec/freshports/git-to-freshports-xml.py --path /var/db/freshports/ports-jail/var/db/repos/PORTS-head-git --single-commit c5f3b87db914aacea80f6e55223c246aa2e14d43 --output /var/db/freshports/message-queues/incoming
Traceback (most recent call last):
  File "/usr/local/libexec/freshports/git-to-freshports-xml.py", line 172, in <module>
    main()
  File "/usr/local/libexec/freshports/git-to-freshports-xml.py", line 132, in main
    ET.SubElement(update, 'OS', Repo=config['repo'], Id=config['os'], Branch=str(repo.active_branch))
  File "/usr/local/lib/python3.7/site-packages/git/repo/base.py", line 696, in active_branch
    return self.head.reference
  File "/usr/local/lib/python3.7/site-packages/git/refs/symbolic.py", line 275, in _get_reference
    raise TypeError("%s is a detached symbolic reference as it points to %r" % (self, sha))
TypeError: HEAD is a detached symbolic reference as it points to '44d4d38cf77e4718e2666128077516c05403e214'
2020.07.15 14:57:58 git-single-commit.sh ending

Looking at the ports tree:

[freshports@devgit-ingress01 ~/ports-jail/var/db/repos/PORTS-head-git]$ git status
HEAD detached at 44d4d38cf77e
nothing to commit, working tree clean

This is the most recent commit processed by FreshPorts.

When FreshPorts processing a commit, it must do a git checkout. It needs the working copy of the repo to be as it was after that commit occurred. This script achieves that goal:

$ cat git-checkout.sh
#!/bin/sh
#
# $Id: svn-up-file.sh,v 1.1 2012-08-15 11:49:10 dan Exp $
#
# Copyright (c) 1999-2019 Dan Langille
#
# This script used to checkout a given commit via a git working copy

echo "num of params = $#"
if  [ $# -ne 2 ];
then echo error invoking script $0 : usage $0 GITDIR REVISION \(e.g. $0 /usr/ports 1234\)
  exit 1
else
    GITDIR=$1
    REVISION=$2

    # we may not need this cd...
    cd ${GITDIR}
    echo "git checkout ${REVISION}}"
    git checkout ${REVISION}
    exit $?
fi
[freshports@devgit-ingress01 /usr/local/libexec/freshports]$ 

In the short term. a git checkout master solved the issue:

[freshports@devgit-ingress01 ~/ports-jail/var/db/repos/PORTS-head-git]$ git checkout master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.

[freshports@devgit-ingress01 ~/ports-jail/var/db/repos/PORTS-head-git]$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

[freshports@devgit-ingress01 /usr/local/libexec/freshports]$ ./git-single-commit.sh c5f3b87db914aacea80f6e55223c246aa2e14d43
2020.07.15 15:20:51 git-single-commit.sh started
2020.07.15 15:20:51 git-single-commit.sh repo is
2020.07.15 15:20:51 git-single-commit.sh XML dir is /var/db/freshports/message-queues/incoming
2020.07.15 15:20:51 git-single-commit.sh /usr/local/libexec/freshports/git-to-freshports-xml.py --path /var/db/freshports/ports-jail/var/db/repos/PORTS-head-git --single-commit c5f3b87db914aacea80f6e55223c246aa2e14d43 --output /var/db/freshports/message-queues/incoming
2020.07.15 15:20:51 git-single-commit.sh ending
[freshports@devgit-ingress01 /usr/local/libexec/freshports]$ 
sarcasticadmin commented 4 years ago

@dlangille saw your tweet: https://twitter.com/DLangille/status/1285750793711292416 sounds like youve made some good progress!

[freshports@devgit-ingress01 ~/ports-jail/var/db/repos/PORTS-head-git]$ git status HEAD detached at 44d4d38cf77e nothing to commit, working tree clean

This seems to be due to the fact that when git-to-freshports-xml.py exits with an error that it leaves the repo in a less than ideal state and in this case detached HEAD. Im not sure why git-to-freshports-xml.py actually needs to try to checkout each of these commits vs just leverage something like git show <SHA> but it still looks like its having issues with being able to leverage the type of HEAD, again this seems like its due to a mismatch of the actual type being used in python.

Anyway it sounds like youve got a path forward and thats good 😄

dlangille commented 4 years ago

Im not sure why git-to-freshports-xml.py actually needs to try to checkout each of these commits vs just leverage something like git show <SHA>

FreshPorts need both.

git show <SHA> is used to create the XML which is then used to populate the database. This data only contains the facts of the commit.

After putting the XML in to the database, the database is then refreshed with data from the repo. There is more information in FreshPorts than that obtained from the commit. A number of make -V commands are run to obtain various data, such as:

This information cannot be obtained from the commit log. It must be extracted from the files. Thus, we do a git checkout <HASH>.

With subversion, there was no need to extract the data from the commit log. It was extracted from an email

With git, there is no git-specific email list yet. Therefore, we took the commit log approach.

This change in the XML generation step eventually led to the creation of a second working copy of the repo, one for XML creation (git log) and one for running make -V (git checkout).

dlangille commented 4 years ago

Running a single commit can be done via:

[dan@devgit-ingress01:~/message-queues/retry] $ ~/scripts/helper_scripts/git-commit-single.sh 4e0c57e6b4740e970be8e2ff640bc7cd560d1b24
2020.07.24 12:20:20 git-single-commit.sh started
2020.07.24 12:20:20 git-single-commit.sh repo is /var/db/ingress/repos/freebsd-ports
2020.07.24 12:20:20 git-single-commit.sh XML dir is /var/db/ingress/message-queues/incoming
2020.07.24 12:20:20 git-single-commit.sh /usr/local/libexec/freshports/git-to-freshports-xml.py --path /var/db/ingress/repos/freebsd-ports --single-commit 4e0c57e6b4740e970be8e2ff640bc7cd560d1b24 --output /var/db/ingress/message-queues/incoming
2020.07.24 12:20:20 git-single-commit.sh ending
[dan@devgit-ingress01:~/message-queues/retry] $