QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
526 stars 46 forks source link

Distrust the infrastructure in workflow management, issue tracking, and doc PR review #3958

Open andrewdavidwong opened 6 years ago

andrewdavidwong commented 6 years ago

We aim to distrust the infrastructure. However, as we've discussed previously and more recently, we actually trust GitHub quite a bit for workflow and issue tracking. We also implicitly trust GitHub when having each other review documentation PRs before merging. For example, whenever I request that @marmarek review a PR because he has expertise that I lack, I have to trust that the interface telling me that he has approved the PR is being truthful when I merge it. I think we should seriously investigate ways of reducing our reliance on (i.e., distrusting) these aspects of GitHub.

fosslinux commented 6 years ago

How would this be possible? GitHub is closed-source, I think the only real way would be to migrate away from GitHub for these tasks.

My only thought would be git cloneing the repository/ies and cross checking that actions were done. However GitHub and the git command is still being trusted that what is on GitHub's servers is actually being cloned. The good thing there though is that git is open-source. Does this mean we can trust git because it is open-source?

woju commented 6 years ago

One thing to do would be to actually backup the issues and pull-requests, irrespective of any migration plans, which may or may not happen in the future. It should include reviews, comments, labels, milestones, and all the metadata maybe up to those ":+1:" reactions under comments. Also we sometimes use comments to commits outside reviews. (Did I miss something?)

Both @marmarek and I have extensive e-mail archive of github notifications, which is better than nothing, but much of the metadata is missing.

fosslinux commented 6 years ago

@woju I see the benefit of that. I'm not sure how comments to commits outside reviews would work, but everything else maybe have a look at the new user Migration API: https://developer.github.com/changes/2018-05-24-user-migration-api/ (I know we're not migrating but it might suit the purpose) and this gist script: https://gist.github.com/rodw/3073987.

tokideveloper commented 6 years ago

@andrewdavidwong wrote:

We aim to distrust the infrastructure. However, as we've discussed previously and more recently, we actually trust GitHub quite a bit for workflow and issue tracking. We also implicitly trust GitHub when having each other review documentation PRs before merging. […] I think we should seriously investigate ways of reducing our reliance on (i.e., distrusting) these aspects of GitHub.

I see that distrusting any infrastructure you don't own is necessary. But wouldn't it be cheaper, easier and more time-saving to move to a trusted infrastructure since there wouldn't be any needs to spend time, money and thinking on tasks towards distrusting it?

tokideveloper commented 6 years ago

@woju wrote:

One thing to do would be to actually backup the issues and pull-requests […]. It should include reviews, comments, labels, milestones, […]

As far as I can see, these things can be migrated to GitLab when using the GitHub to GitLab importer. And since you can host your own GitLab instance, you can rely on that part of infrastructure, at least more than on the current GitHub instance.

[…] and all the metadata maybe up to those "+1" reactions under comments. Also we sometimes use comments to commits outside reviews. (Did I miss something?)

I don't know if these are included into GitLab's importer. But I see that GitLab is under heavy development, so, chances are high that this will be implemented soon, if it's missing. (Especially now, where many projects move to GitLab.)

marmarek commented 6 years ago

It isn't only about "owning" the infrastructure. It's also about it's complexity. Even if we run our own servers, in our own data center, there will be still amazingly complex software stack there (all the http servers, web applications etc), where surely a lot of bugs exist. We prefer to not trust them, instead of attempting to secure them.

tokideveloper commented 6 years ago

Okay, I must say that I missed to make my point. In my posts I wanted to say that emigrating from GitHub and getting ownership of the infrastructure would be probably feasible. But I missed to tell the reasons why I think it's important to do so:

My reasons are on the level of power, not software safety/security. Surprisingly, GitHub was bought by MS. And the first action after that was bad IMHO and also surprising: This time it was censoring "upend" from the Trending page, but tomorrow it could also be

Note that all of this could happen as surprisingly as the deal. E.g. if QubesOS is deleted then you don't have any chance to emigrate. (I know, for this case, someone would have the most recent Git repo, but the issues etc. will be lost.)

All the reasons listed above cannot become a problem for QubesOS if it were hosted on its own infrastructure, I guess.

Am I wrong? Have I overseen something?

tokideveloper commented 6 years ago

It isn't only about "owning" the infrastructure. It's also about it's complexity. Even if we run our own servers, in our own data center, there will be still amazingly complex software stack there (all the http servers, web applications etc), where surely a lot of bugs exist. We prefer to not trust them, instead of attempting to secure them.

Sorry, I can't resist: I've heard that there is an operating system where software you don't trust can be kinda "jailed" into so-called "qubes" in order to prevent them from affecting other parts of your computer/software. Maybe we could ask that project to help us? ;-)

tokideveloper commented 6 years ago

What if MS surprisingly changes the terms of service or modifies the software of GitHub that way that issues, PR comments etc. cannot be exported anymore? Or at least not in a free format? Or only encrypted that it cannot be used elsewhere? Or other things like that?

fosslinux commented 6 years ago

TL;DR: in any case we probably have at least 30 days before MS could change anything.

@tokideveloper All of these are very important points. While I do think that this is an issue and calls for immediate action, there is one major point as to why MS (for now) (probably) could not change things just like that.

In the GitHub Terms of Service Part R: Changes to These Terms it states:

We reserve the right, at our sole discretion, to amend these Terms of Service at any time and will update these Terms of Service in the event of any such amendments. We will notify our Users of material changes to this Agreement, such as price changes, at least 30 days prior to the change taking effect by posting a notice on our Website. For non-material modifications, your continued use of the Website constitutes agreement to our revisions of these Terms of Service. You can view all changes to these Terms in our Site Policy repository.

(emphasis mine)

This clearly states that for material changes, we get 30 days notice, more than enough to migrate to GitLab/trusted infastructure/BitBucket/something else. I would personally classify any of what you have said as material changes, however the ToC do not define material or non-material.

A few possible definitions of material:

While I do not disagree with the importance of these points, it should be considered that none of these are happening any time soon. While there is a probability of them happening tomorrow, it is a low one. But I think caution should still be taken and some backup should be made.

tokideveloper commented 6 years ago

@sstt011 Thank you for your investigation and estimation!

While there is a probability of [possible material changes] happening tomorrow, it is a low one.

Agreed (since I know that 30-days thing now).

But I think caution should still be taken and some backup should be made.

Yes! Maybe, we could make a full backup (of repos, issues, PR comments etc.) and measure the time it needs. If it takes around 30 days or longer then we should definitely make backups in appropriate intervals.

However, if it's possible to make incremental backups then I'd prefer that one in a permanent way.

marmarek commented 6 years ago

There is a new migration API to download all the data associated with user/repository. I don't see documentation about archive format there, but I'd assume it is something machine readable (a set of json files?) that could be used to import it into another service if needed.

andrewdavidwong commented 6 years ago

But I think caution should still be taken and some backup should be made.

We should always have backups regardless of whether we plan to migrate away from the service.

There is a new migration API to download all the data associated with user/repository. I don't see documentation about archive format there, but I'd assume it is something machine readable (a set of json files?) that could be used to import it into another service if needed.

We should use it to make backups regardless of whether we plan to import them into another service.

See: https://groups.google.com/d/msg/qubes-devel/HDt1ZdDMfz4/Q8yS32a-EAAJ

Branched to: #3974

tarsa commented 3 months ago

note: openjdk project has a 'skara' subproject https://openjdk.org/projects/skara/ which is already implemented and it synchronizes between github and e.g. oracle managed mailing lists. synchronization is both ways afaiu, i.e.: