coq-community / manifesto

Documentation on goals of the coq-community organization, the shared contributing guide and code of conduct.
Other
68 stars 6 forks source link

Back-up repository data #76

Open Zimmi48 opened 5 years ago

Zimmi48 commented 5 years ago

Meta-issue

This issue is extracted from an off-topic discussion in #2.

@palmskog on 2019-05-03

Let's say Coq-community grows to dozens of projects with as many or more maintainers. It may happen that someone, adversarially or not, does something unwanted to a repository, such as removing it, moving it, corrupting it, etc.

Is there any periodic snapshotting being done of our repositories to restore from? I know various efforts try to archive open source code, but is there an easily accessible one with frequent updates we can use to restore repos from? Arguably we should document this somewhere.

@Zimmi48

If it is code that you are talking about, I could easily set up mirrors of the coq-community repositories on gitlab.com. That wouldn't be sufficient to preserve meta-data such as issues though.

@palmskog

I'm primarily concerned with the code and commit metadata, but obviously issues and wikis matter as well, even though GitHub seems to keep a lot of history on those. It should be possible to script periodic dumping and copying of metadata using GitHub's API, right? Maybe something to work on at an upcoming workshop. Is this being done for Coq repos, by the way?

I'm all for mirroring at GitLab, but does that cover the "snapshotting" part of the problem? For example, if a repo gets corrupted in some way, the mirror could soon contain only the corrupt version, depending on how it's set up.

@Zimmi48 on 2019-05-04

GitLab's mirroring feature includes options to mirror even force-pushes and deletions, or to only mirror normal pushes and never delete anything. In this latter case, all the information is there to recover in case of accident. However, that could produce wrong alerts if people push topic branches and force-push to them. GitHub's wikis are also git repositories so it is easy to setup a similar mirror.

GitHub does indeed keep a lot of history, in particular in its timeline, but it also allows repository administrators to delete previous edits, comments, issues, and repositories themselves. That's why I was asking whether we should restrict coq-community members' default privileges from admin to write.

Copying issue data using GitHub's API is possible and there are actually already a few services that do it for a fee (e.g. https://github.com/marketplace/backhub). I could also extend @coqbot to do it, but we would need to discuss the design (what to save, how to react to edits, deletions...).

@palmskog on 2019-05-12

For reference, one kind of situation I had in mind for backing up repos is this.

I'm fine with GitLab mirroring, even if it doesn't capture topic branches. But I think it should be complemented by repo tarballs, e.g., once for every 30 days back.

@Zimmi48 on 2019-05-13

Why tarballs?

A good point that I read through your comment is that the more people have write-access to coq-community repositories, the more chances we take that they will be compromised if one user leaks their credentials one way or another.

@palmskog

At least with tarballs one would know for sure: this is what the repository looked like at some specific time. With mirrors, I think one would need deep knowledge of git semantics and implementation to say something similar. For example, can't some just rewrite the reflog?

@Zimmi48

I don't see what risk there would be if the mirror refuses to update if it's not a fast-forward. Then, you can only add stuff on top, not delete it.

@palmskog

I see the point, but one of my points with tarballs is that it removes git from the trusted base (and I don't particularly trust git and definitely not its implementation). In any case, I don't have anything against mirrors.

@Zimmi48

OK now I see your point.

@Zimmi48 on 2019-07-12

FTR I have created the GitLab coq-community organization and the mirrors for all the current repositories, as a temporary solution while waiting for a better one.

palmskog commented 3 years ago

@Zimmi48 I see that our GitLab organization currently does not mirror many repos (partly due to the considerable growth). Maybe we want to document the process for getting a repo mirrored and add it to some checklist in repo transfer issues?