acaudwell / Gource

software version control visualization
https://gource.io
GNU General Public License v3.0
11.54k stars 722 forks source link

Deleted file never vanishes #66

Open Liz4v opened 8 years ago

Liz4v commented 8 years ago

I generated a Gource video of one repository of mine: https://github.com/leandigo/django-oneall

It seems to assume that the file oneall/django_app/models.py is still around at the end. Alas, it was removed at revision d34833f

Used Gource v0.43 on Mac 10.11.4 Homebrew.

stale

mathieu-aubin commented 8 years ago

Hi! Looking at the logfile generated from your repo and selecting all related info...

1431577825|Ekevoo|A|/oneall/django_app/models.py
1433849539|Ekevoo|M|/oneall/django_app/models.py
1433904622|Ekevoo|M|/oneall/django_app/models.py
1433987454|Ekevoo|M|/oneall/django_app/models.py
1433989141|Ekevoo|M|/oneall/django_app/models.py
1433991898|Ekevoo|M|/oneall/django_app/models.py

That file never gets deleted

Liz4v commented 8 years ago

Hi Mathieu, would that be an upstream git bug then?

I'm not sure what the command to generate logs is. git log displays only authors and messages; if I add --dirstat it looks markedly different than what you posted.

mathieu-aubin commented 8 years ago

i have generated this log with gource... using (posted from the readme):

gource --output-custom-log my-project-custom.log

maybe use gitk to browse

Liz4v commented 8 years ago

Well, that's still on gource then.

mathieu-aubin commented 8 years ago

That's very true. Hence the: maybe use gitk to browse - meaning look at your git logs - gource did not invent the file - Could be coming from a merged branch? I wish i could be of more help.

acaudwell commented 8 years ago

You can see the command used by gource to generate the input log file from git:

gource --git-log-command

Currently:

git log --pretty=format:user:%aN%n%ct --reverse --raw --encoding=UTF-8 --no-renames

You can run your own command and save the output to a file. Providing the file is in the same format Gource will read it. If there is a more accurate command it could use it would be good to know.

Liz4v commented 8 years ago

Okay, I've investigated a lot, and here's the missing pieces.

There were two branches during June last year. There was a lot of activity in the green develop branch, and there was a bugfix for the models.py file in the black/purple master branch. Develop One of the first things done in the develop branch was exactly a directory move, that was properly handled by gource as expected. Because of the intertwined activity in the master branch made the models.py file re-appear, which is a bit weird, but completely understandable.

However, down the road, there's a merge commit 0d32878 and it includes a delete of that file (oneall/django_app/models.py) along with several other modifications. Merge

Still, that command (which I modified to display the commit hash) does not list a single modified file for this particular commit! Only the previous one (4ec4c6e) and the next one (da69603).

$ git log --pretty=format:user:%aN%n%ct\ %H --reverse --raw --encoding=UTF-8 --no-renames
(…snip…)
user:Ekevoo
1438223791 4ec4c6eb8a88e11a46790d7f3d5492f7d31c6c84
:100644 100644 1a8328f... 09c6d8e... M  oneall/django_oneall/management/commands/legacyimport.py

user:Ekevoo
1438224809 0d328789f4ea0f802de4dbbafea3605184d5c72c
user:Ekevoo
1441078837 da6960312cfa8a601ceff1a4a7378384f1a372ef
:100644 100644 eb27d58... 8ab8ca4... M  oneall/django_oneall/auth.py
:100644 100644 cd83385... 3e4cc68... M  oneall/django_oneall/templates/oneall/login.html
:100644 100644 6bc1701... 3f7adf9... M  oneall/django_oneall/views.py
(…snip…)

I'm not sure what to suggest now.

tienne-B commented 4 years ago

I have a little bodged script that passes through the log generated by gource's default git log command to make sure no deleted files get modified, thus re-added:

files = set()

def test_file(file, action):
    if action == 'A':
        files.add(file)
        return True
    elif action == 'D':
        try:
            files.remove(file)
        except KeyError:
            return False
        return True
    elif action == 'M':
        if file in files:
            return True
        return False

f = open("new_log.txt", "a+")

for line in open("log.txt", "r"):
    l = line.split("\t")
    if len(l) == 2:
        if test_file(l[1], l[0][-1]):
            f.write(line)
    else:
        f.write(line)
PF94 commented 4 years ago

This also happens on my repository.

image This file here has never existed in my repository for over a year now. It's some weird Microsoft Frontpage junk file.

Same thing goes for this file whose name was unfonturaly poorly named so I had to blur it. The file's parent folder was renamed, and the file was later deleted. image

FlorianWilhelm commented 3 years ago

The best method to solve this problem that I found is to just linearize your git history first with

git filter-branch --parent-filter 'cut -f 2,3 -d " "'

before you run gource. This will just avoid any kind of problem with files not disappearing due to merge commits. ATTENTION: Do this with a fresh checkout, not with something you are working on!

andyquinterom commented 2 years ago

There must be a better solution than rewritting history.

Liz4v commented 2 years ago

From my perspective the git log output is insufficient because it omits merge commits. This problem would not exist if merge commits were considered.

andyquinterom commented 2 years ago

If we add the option --first-parent to git log the problem seems to solve itself. I will open a PR with the changes.