dlang / project-ideas

Collection of impactful projects in the D ecosystem
36 stars 12 forks source link

Moving Bugzilla to Github #43

Open burner opened 5 years ago

burner commented 5 years ago

Description

Moving bugzilla issues to github

What are rough milestones of this project?

  1. build automation
  2. run in parallel
  3. validate everything is good
  4. shut bugzilla down

How does this project help the D community?

bugzilla is not really state of the art anymore read more about it here https://www.python.org/dev/peps/pep-0581 http://pyfound.blogspot.com/2019/05/mariatta-wijaya-lets-use-github-issues.html

Recommended skills

Point of Contact

## References https://forum.dlang.org/post/gjlwnhqzkdcpvghtjwwe@forum.dlang.org
jacob-carlborg commented 5 years ago

build automation

What needs to be built?

burner commented 5 years ago

the tool that moves everything from bugzilla to github

wilzbach commented 5 years ago

The tool already exists since quite a while: https://github.com/wilzbach/bugzilla-migration and has even been tested: https://github.com/wilzbach/tools-test/issues. I never got the formal approval for this migration which was the only blocker :/

ghost commented 5 years ago

There's links to bugzilla in the source code, some just reference the issue number as well. Including a bunch of tests are just named after the issue number. Not sure if your script takes that into consideration?

wilzbach commented 5 years ago

The script was never fully completed as it was a formal deadend. It was just a test that it's more than feasible to migrate and how it would roughly look.

WalterBright commented 5 years ago

Does github offer a means for backing up the issue database?

lesderid commented 5 years ago

Yes: https://developer.github.com/v3/migrations/orgs/#download-an-organization-migration-archive

WalterBright commented 5 years ago

@braddr has Ok'd this.

WalterBright commented 5 years ago

Is there a 1:1 mapping between bugzilla issue URLs and the eventual github ones? I ask because the n.g. archives should have the URLs remapped.

WalterBright commented 5 years ago

Also the URLs in the dmd source code.

jacob-carlborg commented 5 years ago

I assume we would set up a server that does the redirects.

Geod24 commented 5 years ago

Does Github has a way to allow anyone to triage issues ? At the moment anyone can get started with cleaning up bugzilla, but that won't be the case with Github. Not a big deal IMO but I didn't see it mentioned.

In any case, very happy to see some progress on this!

wilzbach commented 5 years ago

GitHub introduced a new permission level "triage" for exactly this problem. I am not sure whether this can be applied to everyone, but considering that only a handful of people actually triage we could setup a very liberal "triage user group" and invite anyone interested. Obviously we would need to modify the dlang-bot a bit to ensure that the auto-merge label has no effect for triage users, but that shouldn't be hard.

More about this: https://help.github.com/en/articles/repository-permission-levels-for-an-organization

And yes, we would obviously save the 1:1 mapping between Bugzilla and GitHub issues and setup a simple redirect server, so that all old Bugzilla URLs will be redirected to their respective GitHub issues.

wilzbach commented 5 years ago

BTW one big argument against the migration was that the GitHub API is heavily rate-limited and we can't export the issues anymore. With the GraphQL API that's no longer a problem and we can easily export everything with a few paginated queries, e.g.

query { 
    repository(owner:"wilzbach", name:"tools-test") {
       issues(last:100) {
        edges {
          node {
            title
            url
            author {
              login
            }
            closed
            bodyText
            createdAt
            closedAt
            number

            comments(first: 100) {
              edges {
                node {
                  author {
                    login
                  }
                  bodyText
                  createdAt
                }
              }
            }
          }
        }
    }
  }
}
wilzbach commented 4 years ago

BTW an alternative that might be worthwhile to consider for the transition period would be a two-way bridge: syncing all "transitioned" GitHub issues with their respective Bugzilla issues, but disallowing creation of new Bugzilla issues.

In other words:

WalterBright commented 4 years ago

In other words:

I'd just go the simpler route of making Bugzilla read-only once the data has been moved to github.

thewilsonator commented 4 years ago

FWIW, LLVM is also (well, will be after they finish migrating to GitHub, ~ 3 weeks) moving from their bugzilla to GitHub issues. The rationale is that GitHub is too important an avenue of reports for (new) users. I think they too are, going to make bugzllla read only.

thewilsonator commented 4 years ago

Link to LLVM discussion: http://lists.llvm.org/pipermail/llvm-dev/2019-October/136162.html

Geod24 commented 4 years ago

Could we start by moving individual projects ? I'm thinking installer, dlang.org right now.

Geod24 commented 4 years ago

Note that there is a "new" issue template that should also be very useful for us: https://help.github.com/en/github/building-a-strong-community/configuring-issue-templates-for-your-repository

Geod24 commented 4 years ago

@wilzbach : What are we missing to start on installer ?

WalterBright commented 4 years ago

What do I need to do?

Geod24 commented 4 years ago

There's three options:

IMHO the first approach is by far the simplest / most straightforward, as long as you're comfortable with it. The second is probably the most correct but would require a bit of back and forth in the future. And the last one will be quite painful to me, I fear (I was testing out with my organization to see what I'd need).

In any case, once a component has been transferred, we will need someone with Bugzilla admin access to disable the component. Who should we contact for this ?

ibuclaw commented 4 years ago

I'd just go the simpler route of making Bugzilla read-only once the data has been moved to github.

I've done this for gdc bugzilla before, based off http://toolsmiths.blogspot.com/2008/05/making-bugzilla-read-only.html

  1. Remove "Open for bug entry" for all products on editproducts.cgi
  2. Create a "canstilledit" group on editgroups.cgi. Check "Insert new group into all existing products", and only add admins to the group.
  3. Use "Edit Group Access Controls" on editproducts.cgi to check the "Canedit" boolean for only the new group "canstilledit", so it will become read-only for any users who are not members of the "canstilledit" group.
  4. Set announcehtml to point people to the new issue tracker on editparams.cgi?section=general. I have:
    <div id="message">
    Bug creation has been disabled, file new bugs at <a href="https://gcc.gnu.org/bugzilla">gcc.gnu.org/bugzilla</a>
    </div>
CyberShadow commented 4 years ago

It comes as an unpleasant surprise to discover this conversation so much time after it began considering my proximity to the subject. The informed participants have neglected to mention a few things:

Now the migration begins and one easily avoidable mistake has already been made (using a personal account instead of a machine account).

Why was neither any of the above mentioned or I was involved in this discussion? I can't think of any explanation other than malicious intent. That one grumpy person's opinion is different from ours, so let's just not include them, no matter that they spent weeks researching and working on this same problem. Shame on you, guys.

To be clear. I wholeheartedly agree that the current Bugzilla instance, as it is at issues.dlang.org, is clunky and in dire need of improvement. What I've been suggesting this entire time is to investigate less radical options of improving it first, and see how much the situation improves without massively disruptive undertakings such as moving the entire issues database.

CyberShadow commented 4 years ago

BTW one big argument against the migration was that the GitHub API is heavily rate-limited and we can't export the issues anymore. With the GraphQL API that's no longer a problem and we can easily export everything with a few paginated queries, e.g.

A good test for that theory would be to write a script which downloads all the issues on https://github.com/rust-lang/rust/. GraphQL rate limits are very different from the REST API, so it may not work as well as you expect.

CyberShadow commented 4 years ago

Here are some things which we cannot do with Bugzilla:

Here are some things which we can do with Bugzilla (the new Bugzilla version either already supports this, or can be improved to support this):

Here are some things which we can only do with Bugzilla, and not GitHub issues:

The last point is much more important than it may seem. If we have more than one issue number, the following problems occur:

Unless this can somehow be avoided, it will cause a huge mess and never-ending confusion and frustration.

So, how to proceed? I was having severe difficulty gauging how important this issue is (if I knew about the discussion / interest then I would have dedicated more time to it), but considering the interest (:+1:s) here I would like to suggest the following:

In my opinion we have more to gain from a polished Bugzilla instance than a messy GitHub migration. Thoughts?

WalterBright commented 4 years ago

Wow, I had no idea about all these pros and cons. Thank you! Lots to think about.

Geod24 commented 4 years ago

Now the migration begins and one easily avoidable mistake has already been made (using a personal account instead of a machine account).

That one is on me. Duly noted, and will fix. To be clear, the migration hadn't fully begun. I merely started experimenting with tools (a repository with very low bandwidth), in order to find out pain points.

And it did find some:

I wholeheartedly agree that the current Bugzilla instance, as it is at issues.dlang.org, is clunky and in dire need of improvement.

The need for improvements is not the only reason for this migration. It plays a big role, sure, but there is no denying that first-time contributors will find less pain in using Github than a separate website (even if they can login via their Github account). That was mentioned by @thewilsonator here.

Here are some things which we can only do with Bugzilla, and not GitHub issues:

  • A bug reporting wizard, with custom logic such as automatic test case reduction or bisection

You can do that on Github now, through Github actions. Additionally, anyone is able to work on the integration, instead of having that right limited to a few, overworked people.

  • Non-boolean metadata (e.g. you cannot sort by severity with GitHub labels, only filter)

True, although throwing together a user script shouldn't be hard. And you can sort by priority on Bugzilla, but it's not really efficient. Priority labeling is very inconsistent across people, save for a few boolean ones (e.g. trivial and regression), what is normal, major, critical and blocker is easily confused. The labels put for tools, so far, expose 5 levels of priority (Regression, Blocker, Normal, Low, Trivial). I originally was hoping to merge Low and Trivial but it seems to be used enough to warrant the separation (and to avoid the disturbance).

  • Anyone can easily download all data and use it offline

Sounds like an artificial point. If you do that, you either want to work on a bug offline (one can just save the webpage), or script something (like bug reduction tool), which you can easily do via the Github API. In the later case, your scripted tool might be able to pick up D code blocks more easily if we're on Github.

  • Own our data

That's a very broad topic. The cost of owning our data is self-maintenance and implementation of features. If we really want to "own our data", and that point trumps all other considerations, we could consider Github Entreprise. But let's be serious, Github is less likely to vanish than any single contributors, and many of us are SPOF (what happens if you go AWOL ? Or Brad ? Or Mike Parker ?). What matters more to us, and Walter has made this pretty clear, is that we have backup for all this data.

  • Have one issue number per bug

That is indeed a pain point. I wanted to experiment with a few things with the tools repository. So far, I believe, using Issue XXX vs Issue #XXX can do the trick, although it's not the most user-friendly approach. Note that the import does not import closed / fixed bugs (it wouldn't make sense, since we can't retain the issue number anyway).

Comments in D source code will no longer unambiguously refer to an issue.

This needs to be fixed. DMD transitioned from issue number to links a long time ago, and that needs to be applied to other repositories as well.

File names in the DMD test suite will no longer unambiguously refer to an issue.

Likewise, we should provide a link for every test that refers to an old issue.

GitHub will auto-linkify "issue ###" in old commit messages, but it will link to the GitHub issue, i.e. it will create broken links.

Only when # is used, which is why I suggested the separation. But that problem already exists with pull requests, and I rarely see # being used for that reason.

In my opinion we have more to gain from a polished Bugzilla instance than a messy GitHub migration. Thoughts?

Your post make it sound like we're going to just fire a quickly-written, automated script overnight, on all repository. In practice most of the work so far has been triaging, cleaning up, issues. I haven't fired any request to the Github API, but merely automated the issue body generation and some metadata. What in the issues that have been moved did you find messy ?

Geod24 commented 4 years ago

The more TL;DR version would be:

The other concerns, we can easily work around them. You right that the bug ID is the main concern, however, due to our mixing of Github and Bugzilla, using # is currently not so common, so I think the pain you envision is being blown out of proportion.

To give an example of another benefit that switching to Github brings: much better categorization. At the moment filling an issue on Bugzilla just leaves you with a blank box, no template. In Github you can provide issue templates. See the example I put there: https://github.com/dlang/tools/issues/new/choose

WalterBright commented 4 years ago

An issue that would need to be resolved is a mechanical way to convert bugzilla issue URLs to the matching github issue URL. I'd use it to convert the URLs in the dmd source code, in the test cases, and in the forum archives.

CyberShadow commented 4 years ago

@WalterBright

Wow, I had no idea about all these pros and cons. Thank you! Lots to think about.

We discussed all of this way then. I realize you have a lot on your plate though.

An issue that would need to be resolved is a mechanical way to convert bugzilla issue URLs to the matching github issue URL. I'd use it to convert the URLs in the dmd source code, in the test cases, and in the forum archives.

Even if we did a perfect job with doing so in our source code, we can't change source code in other projects (e.g. I have lots of "workaround DMD bug XXXX" in my code), and we can't change commit messages, which will make bisection even more annoying.

@Geod24

You can do that on Github now, through Github actions.

Either we're talking about different things, or I can't find any documentation for this.

Additionally, anyone is able to work on the integration, instead of having that right limited to a few, overworked people.

No, that's completely false. First of all, what's stopping you from working on improving Bugzilla? But, I have already made working on the new Bugzilla version easy by creating a script which sets up and runs everything. I will go further and create a Docker image so that people don't need to worry about host dependencies.

Sounds like an artificial point. If you do that, you either want to work on a bug offline (one can just save the webpage), or script something (like bug reduction tool), which you can easily do via the Github API.

I'm talking about something like https://github.com/CyberShadow/DBugTests#the-populate-program

In the later case, your scripted tool might be able to pick up D code blocks more easily if we're on Github.

No, that has nothing to do with GitHub. We just need code blocks or a proper form, where you put prose in one box and code in another. The platform doesn't matter.

What matters more to us, and Walter has made this pretty clear, is that we have backup for all this data.

OK, and where is the backup to all the the pull request discussions?

Your post make it sound like we're going to just fire a quickly-written, automated script overnight, on all repository.

No, that was not descriptive of your efforts, just describing the worst-case scenario :)

What in the issues that have been moved did you find messy ?

I acknowledge that you've been otherwise very thorough, thank you!

Geod24 commented 4 years ago

Even if we did a perfect job with doing so in our source code, we can't change source code in other projects (e.g. I have lots of "workaround DMD bug XXXX" in my code), and we can't change commit messages, which will make bisection even more annoying.

Which is why I was rather happy with the plan to make Bugzilla read-only. We can keep the old link around and transition to Github issue links. Over time the old links will become less and less relevant. We already did this for the DMD -> DDMD transition. By moving all the files in the repository, we essentially created a checkpoint for anyone doing blame or looking at the history. And yes it is annoying to switch to the old file, and require a small user intervention, but as time passes, it becomes less and less of a problem. I expect things with bugzilla to go in the same direction, and that's why only transitioning open issues makes sense.

OK, and where is the backup to all the the pull request discussions?

Nowhere to be found yet, but it should be done. And that's actually a good point in favor of the migration: It's easier to maintain one backup system over two backup systems.

Either we're talking about different things, or I can't find any documentation for this.

You want something that implements an arbitrary action when an issue is opened/closed/edited/whatever, right ? Currently we do this for pull requests via the dlang bot. We can extend the dlang-bot, but some of it could also be implemented via Github actions (they can trigger on pretty much anything, including label added, issue opened, etc...). For example, at the moment, the dlang-bot labels things with "needs-rebase". Well there is an ~app~ action for it. Auto-merge label ? There is an action for it. Maybe you want to use a comment instead of a label ? No problem. There are many more actions available out there, and developing a new one is fairly easy. The documentation is still not perfect, but you can find it here.

I'm talking about something like https://github.com/CyberShadow/DBugTests#the-populate-program

Doesn't that program query the Bugzilla API ?

No, that has nothing to do with GitHub. We just need code blocks or a proper form, where you put prose in one box and code in another. The platform doesn't matter.

Some platforms make it easier. Bugzilla has no set code block syntax, while Github has one, and you can specify the language in use. One thing, for example, is that maintainers can edit comments in issue. So adding informations, or properly annotating code example in code block with D syntax highlight is trivial to do on Github (I routinely do it). That should make extraction much simpler, shouldn't it ? That's why the template for bug report have both D and console code blocks, by the way. You could also detect C++ code for C++ integration issue if you want to.

No attempt has been made to preserve the issue number, causing the new issues to now have two issue numbers

You said it yourself, it's impossible to do so. And as I mentioned before, while it is the most disruptive, it's not as big of an issue as you make it, because of the points I mentioned (importance decrease with time, and usage of Issue XXX vs #XXX).

CyberShadow commented 4 years ago

Over time the old links will become less and less relevant.

We have over twenty thousand issues filed over the past two decades. Realistically speaking, I don't think we are ever going to rid ourselves of old references to old issues.

By moving all the files in the repository, we essentially created a checkpoint for anyone doing blame or looking at the history.

I don't think it's right to compare this to git blame.

Nowhere to be found yet, but it should be done. And that's actually a good point in favor of the migration: It's easier to maintain one backup system over two backup systems.

If the new bugzilla will be hosted on the same server an the wiki and forum, there will be no new backup system. Furthermore, that system has already been battle-tested last year.

You want something that implements an arbitrary action when an issue is opened/closed/edited/whatever, right ?

No. I'm talking about custom forms that transform the input and allow the user to review it before submitting it.

Doesn't that program query the Bugzilla API ?

Yes, and it allows doing something which at the time was not possible to do with GitHub (I'm not sure now).

Some platforms make it easier. Bugzilla has no set code block syntax, while Github has one, and you can specify the language in use. One thing, for example, is that maintainers can edit comments in issue. So adding informations, or properly annotating code example in code block with D syntax highlight is trivial to do on Github (I routinely do it). That should make extraction much simpler, shouldn't it ?

All of these are describing the limitations of the current version on issues.dlang.org. Newer Bugzilla versions do not have these. This is what I meant in my "things which we can do with Bugzilla" list.

You said it yourself, it's impossible to do so.

I'm not sure it's impossible! But it might be very tricky. But, I don't think that's a direction we should pursue at all at the moment.

Geod24 commented 4 years ago

This is what I meant in my "things which we can do with Bugzilla" list.

We can do absolutely everything with Bugzilla, since it's a FOSS product we have the source code of. Just like you can write any program in assembly language. The question is, how much effort do you need to get there ?

CyberShadow commented 4 years ago

We can do absolutely everything with Bugzilla, since it's a FOSS product we have the source code of.

Indeed :) Also, using and improving Bugzilla benefits FOSS as a whole.

The question is, how much effort do you need to get there ?

IIRC, except for the custom D-specific logic, everything on that list is either in the latest upstream Bugzilla or BMO/Harmony.

More details here: https://github.com/CyberShadow/bugzilla-meta/issues

Geod24 commented 4 years ago

An issue that would need to be resolved is a mechanical way to convert bugzilla issue URLs to the matching github issue URL. I'd use it to convert the URLs in the dmd source code, in the test cases, and in the forum archives.

@WalterBright : Does applying the conversion done in https://github.com/dlang/phobos/pull/7441 to all D repo help ? As mentioned in the PR, I think it's a good change regardless of this discussion, but I hope it will help ease your concerns.

With this in place:

You can see how this plays out with issues that were transferred already. Take https://github.com/dlang/tools/issues/398 for example, the very first sentence mentions that it was transferred from Bugzilla. The bugzilla entry is RESOLVED MOVED and the last comment is a link to the issue.

Another pro of moving to Github that wasn't (explicitly) mentioned here is the project mapping. Currently we have a few issues on bugzilla which are for the wrong component (e.g. optlink bugs in tools). We also have a confusing (for new users) split, for example a bug on the documentation needs to be reported on bugzilla, but a bug on tour.dlang.org (or code.dlang.org) needs to be reported to one of the dlang-tour projects, or to dlang/dub-registry, respectively.

@WalterBright : What would be your requirements to move forward with the migration ? As mentioned, for the moment, I am focusing on tools and installer in order to build & test the automation associated with those repos (e.g. tagging a release).

CyberShadow commented 4 years ago

With all due respect for Mathias' work, and unless the BDFLs object, I would like to handle future work regarding our issues database, due to my familiarity with the subject at hand, including experience with the GitHub API. This includes any future attempts to migrate issues to GitHub, if we decide to, as I think this migration attempt has not been researched and planned as well as it could have.

@Geod24

I appreciate the fervor and expediency with which you intend to pursue your goal, though maybe not how quick you dismiss the work I've already done and is so close to bearing fruit. As mentioned above, I think it would be in everyone's best interest if we improve what we have now instead of trading compromise for compromise. Improving Bugzilla will be the main focus for my following D-related work, and I think you will be pleased with the results as well, if you give me time to fulfil what was planned. As such, I would like to kindly ask you to pursue other avenues for improving D for the time being, or join me in my work on Bugzilla.

Another pro of moving to Github that wasn't (explicitly) mentioned here is the project mapping. Currently we have a few issues on bugzilla which are for the wrong component (e.g. optlink bugs in tools).

This is because the optlink component was added later.

We also have a confusing (for new users) split, for example a bug on the documentation needs to be reported on bugzilla, but a bug on tour.dlang.org (or code.dlang.org) needs to be reported to one of the dlang-tour projects, or to dlang/dub-registry, respectively.

We can configure redirect-only components in Bugzilla to avoid this confusion.

lesderid commented 4 years ago

Perhaps an issue tracker migration/overhaul should go through the DIP process? That gives the community, and especially @CyberShadow, the chance to formally present arguments for and against migration, including possible alternatives. dlang/projects is good for general ideas and goals, but its visibility is low compared to DIPs, and a large undertaking like this could benefit from a more formalised plan.

Maybe it's too early for this though.

Geod24 commented 4 years ago

I think this migration attempt has not been researched and planned as well as it could have.

I do not see evidence of this. While you only discovered this issue recently, and I understand the resulting frustration, this conversation had been going on for a while, and many contributors have provided input, experiments have been made, etc...

though maybe not how quick you dismiss the work I've already done and is so close to bearing fruit.

This feels a bit personal, and wrong. The migration issue was raised during a meeting and Walter okayed it, and I offered to handle it. I haven't since heard a clear demand to stop it, but I did suspend it in the middle of doing the tools out of respect for your opinion, so that there would be time for discussion.

I don't think that the work you performed will be lost, either. As mentioned previously, the transition will only happen for open issues. Your work on Bugzilla Harmony can still be used to provide a better experience in consulting old bugs resolved before the transition.

As mentioned above, I think it would be in everyone's best interest if we improve what we have now instead of trading compromise for compromise. Improving Bugzilla will be the main focus for my following D-related work, and I think you will be pleased with the results as well, if you give me time to fulfil what was planned. As such, I would like to kindly ask you to pursue other avenues for improving D for the time being, or join me in my work on Bugzilla.

I think this is where the misunderstanding is coming from. While I appreciate input about what hasn't been done correctly so far (e.g. using my account), the question is not whether or not we perform this migration, but how to do it so that it is as smooth as possible. This issue is not about "Improve issue tracker" or "Features missing from Bugzilla", it is about "Moving Bugzilla to Github".

CyberShadow commented 4 years ago

I do not see evidence of this. While you only discovered this issue recently, and I understand the resulting frustration, this conversation had been going on for a while, and many contributors have provided input, experiments have been made, etc...

No, sorry, this topic well predates this discussion.

This feels a bit personal, and wrong.

I'm sorry you feel this way.

The migration issue was raised during a meeting and Walter okayed it, and I offered to handle it.

Unfortunately we have to regularly revisit old subjects with Walter as he understandably can't keep track of everything that keeps going on in the community. See e.g. https://forum.dlang.org/post/r234nd$s26$1@digitalmars.com

but I did suspend it in the middle of doing the tools out of respect for your opinion, so that there would be time for discussion.

Thank you, but, I'm hoping we can make decisions based on what is best for the project, and not just out of mutual respect.

I don't think that the work you performed will be lost, either. As mentioned previously, the transition will only happen for open issues.

Actually, I think this is a bad idea. There isn't really a clear line between "old, archived issues that nobody cares about" and "new issues that we definitely want to do something about". Old issues can get reopened and revisited, while some new issues get buried and become obsolete or inactionable with time. We should strive to not have more than one way to track bugs within the same project.

the question is not whether or not we perform this migration, but how to do it so that it is as smooth as possible.

As discussed above, moving any issues to GitHub with what we have right now would likely lead to a suboptimal outcome, so we should not do it, and concentrate on alternatives with no trade-offs first.

CyberShadow commented 4 years ago

I will go further and create a Docker image so that people don't need to worry about host dependencies.

Dylan from Mozilla beat me to it: https://github.com/CyberShadow/bmo#using-docker-for-development https://github.com/CyberShadow/bmo/commit/a8db810d24494137b58d207ea41e1569e943bd6f