erikbern / git-of-theseus

Analyze how a Git repo grows over time
https://erikbern.com/2016/12/05/the-half-life-of-code.html
Apache License 2.0
2.56k stars 85 forks source link

Author stats over multiple repos #70

Open PriitParmakson opened 4 years ago

PriitParmakson commented 4 years ago

Wrote a script that merges authors.json files produced by git-of-theseus-analyze, so that authors chart can be produced over multiple repos.

erikbern commented 4 years ago

Thanks for the addition. I think this would be cleaner if the stack plotting script would take multiple files on the command line, or what do you think?

PriitParmakson commented 4 years ago

It felt safer to make it separately, as it's my first program in Python. I wrote first version in Go, to satisfy my immediate need. Of course, the integrated way is cleaner. I'll look into it, in a week perhaps.

PriitParmakson commented 4 years ago

Now there seem to be some side effect. I don't know Travis.

Idea of the merging algorithm:

Each authors.json file defines a function LOC(r, a, t), where r is repo name, a ∈ A(r) is author (from the set of authors present in repo), and t ∈ T(r) is time.

For the plot we need a fully defined function LOC(r, a, t), where r ∈ R (set of all repos), a ∈ A (union of authors of repos), and t ∈ T (union of times of repos).

However, authors.json files provide data only for a partially defined LOC(r, a, t). Fully defined function can be obtained by extrapolation: If the is no data point for (r, a, t) in authors.json file, then define (r, a, t) = (r, a, t1), where t1 is the latest timepoint, t1 < t, present in the file of r; if there's no such timepoint, then (r, a, t) = 0.

erikbern commented 4 years ago

this is a better approach. the code seems pretty convoluted though – feels like it shouldn't be more than a 5-10 lines to accomplish what you want. you just need to add up the stats and account for the fact that the timestamps are irregular right?

leonid-shevtsov commented 2 years ago

I don't know if it's too convoluted or not (also not a pythonist), but this PR sure helped me produce a chart for all of our repos together 👍

drew2a commented 1 year ago

I don't know if it's too convoluted or not (also not a pythonist), but this PR sure helped me produce a chart for all of our repos together 👍

It was helpful for me as well. Thank you! @PriitParmakson