datalad / datalad-installer

Installation script for Datalad and related components
MIT License
6 stars 3 forks source link

Figure out why `major` label was added to dependabot created PR #175

Open yarikoptic opened 1 year ago

yarikoptic commented 1 year ago

https://github.com/datalad/datalad-installer/pull/164 had internal and major labels, which I have missed and thus merging/releasing resulted in 1.0.0 release of datalad-installer. Not a biggie, I think we are fine with 1.0.0 release (finally), but it is not clear why "major" was added since https://github.com/datalad/datalad-installer/blob/master/.github/dependabot.yml#L10 has only internal.

jwodder commented 10 months ago

This happened because, whenever Dependabot creates a PR for a major version update (e.g., v3 to v4), if the repository the PR is created in has a "major" label defined, the label will be applied to the PR regardless of what's in dependabot.yml, and likewise for minor and patch version updates. I don't think this is mentioned in Dependabot's documentation, but there's an open issue to give Dependabot the option to not do this.

I recommend addressing this by reconfiguring auto to use labels that aren't named "major", "minor", and "patch"; for an example configuration, see auto's last used configuration in datalad/datalad. Once .autorc is updated, the labels in the repository will have to be renamed manually; I think that running auto create-labels would just create new labels rather than renaming, so don't do that. (I wrote https://github.com/jwodder/labelmaker over my break which could be of use here.)

(An alternative approach would be to create a GitHub Actions workflow that automatically relabels Dependabot PRs, but that seems too much like a bandaid.)

auto will also have to be reconfigured on all other repositories that use both auto and Dependabot; the following script will list them:

#!/usr/bin/env python3
# /// script
# requires-python = ">=3.8"
# dependencies = ["ghreq ~= 0.1", "ghtoken ~= 0.1"]
# ///

from __future__ import annotations
from collections.abc import Iterator
import ghreq
from ghtoken import get_ghtoken

OWNERS = ["con", "dandi", "datalad"]

class Client(ghreq.Client):
    def get_repos_for_owner(self, owner: str) -> Iterator[dict]:
        return self.paginate(f"/users/{owner}/repos")

    def has_file(self, repo_url: str, path: str) -> bool:
        try:
            self.request("HEAD", f"{repo_url}/contents/{path}", raw=True)
        except ghreq.PrettyHTTPError as e:
            if e.response.status_code == 404:
                return False
            else:
                raise e
        else:
            return True

with Client(token=get_ghtoken()) as client:
    for owner in OWNERS:
        for r in client.get_repos_for_owner(owner):
            if r["archived"] or r["fork"]:
                continue
            if client.has_file(r["url"], ".autorc") and client.has_file(
                r["url"], ".github/dependabot.yml"
            ):
                print(r["full_name"])

You may also want to create an issue in auto's repository about this; there doesn't seem to be one there already.

jwodder commented 10 months ago

I just realized there's another category of our repositories that use both "major" labels and Dependabot: those that use datalad/release-action with labels named "major" etc. The only such repository seems to be https://github.com/datalad/datalad-container. (datalad/release-action itself also uses "major" labels, but it doesn't use Dependabot.)

yarikoptic commented 10 months ago

ideally it should be IMHO addressed on dependabot, but given that that issue is from Apr 9, 2021 and not yet resolved, I wonder if it would ever be. Indeed then we are doomed to switch everywhere (auto or datalad/release-action driven projects) to use e.g. semver- prefixed labels.

You may also want to create an issue in auto's repository about this; there doesn't seem to be one there already.

would you be kind to do so?

jwodder commented 10 months ago

@yarikoptic Issue created: https://github.com/intuit/auto/issues/2412

yarikoptic commented 2 months ago

With the recent fiasco on heudiconv, I think we are doomed to act.

@jwodder could you please compose a script which given a repository URL (or just taking url of default remote for the current branch in current git repo) would go and rename all the labels and adjust .autorc accordingly for labels, prepending with semver- (making prefix to be an option as well so someone might tune to their liking).

jwodder commented 2 months ago

@yarikoptic Should the .autorc update be done in a pull request or just pushed directly to the default branch? I suspect we have some repositories that, like datalad/datalad, should only ever be updated via PRs, but if .autorc is updated via PR, it would be more appropriate to delay the label renaming until after it's merged.

yarikoptic commented 2 months ago

IMHO it should be up to the user to facilitate quick submit/review/merge of the PR if needs to be done via PR so IMHO would be ok if there is a short period of time when things are inconsistent.

NB for datalad/datalad I think it could be just a direct push, nothing forbids it, and even scriv changelog could have been crafted manually.

jwodder commented 2 months ago

@yarikoptic Script (untested):

#!/usr/bin/env python3
# /// script
# requires-python = ">=3.8"
# dependencies = ["ghrepo ~= 0.7", "ghreq ~= 0.5", "ghtoken ~= 0.1"]
# ///

"""
This script updates an `auto`-using GitHub repository so that the `auto` label
names (other than "release") will begin with a given prefix ("semver-" by
default).  It must either be run inside a local clone of such a repository or
else passed one or more paths to local clones of such repositories.

This script assumes that the repository contains an `.autorc` file that is
valid JSON (i.e., it does not contain any comments) and that does not already
define a "labels" field.
"""

from __future__ import annotations
import argparse
import json
import logging
from pathlib import Path
import subprocess
from ghrepo import GHRepo, get_local_repo
from ghreq import Client
from ghtoken import get_ghtoken

# (name, release type)
LABELS = [
    ("major", "major"),
    ("minor", "minor"),
    ("patch", "patch"),
    ("dependencies", "none"),
    ("documentation", "none"),
    ("internal", "none"),
    ("performance", "none"),
    ("tests", "none"),
]

PR_BRANCH = "auto-prefix-labels"

log = logging.getLogger("reautolabel")

def main() -> None:
    parser = argparse.ArgumentParser()
    parser.add_argument("-P", "--prefix", default="semver-")
    parser.add_argument("dirpath", type=Path, nargs="*")
    args = parser.parse_args()
    logging.basicConfig(
        format="%(asctime)s [%(levelname)-8s] %(name)s: %(message)s",
        datefmt="%H:%M:%S",
        level=logging.INFO,
    )
    dirs = args.dirpath or [Path.cwd()]
    prefix = args.prefix
    title = f"Prefix auto labels with {prefix!r}"
    body = (
        "This PR modifies the `auto` configuration so that the label names"
        f' (other than "release") will be prefixed with {prefix!r}.  This is'
        " necessary to keep Dependabot from applying `auto` labels to its PRs,"
        " causing undesirable version bumps."
        "\n\n"
        "See <https://github.com/datalad/datalad-installer/issues/175> for more"
        " information."
    )
    with Client(token=get_ghtoken()) as client:
        for d in dirs:
            log.info("Operating on %s ...", d)
            repo = get_local_repo(d)
            log.info("'origin' remote points to GitHub repository %s", repo)
            repo_data = client.get(f"/repo/{repo}")
            defbranch = repo_data["default_branch"]
            head_owner: str | None
            if (parent := repo_data.get("parent")) is not None:
                head_owner = repo_data["owner"]["login"]
                repo = GHRepo.parse(parent["full_name"])
                log.info("GitHub repository is a fork; operating on parent %s", repo)
            else:
                head_owner = None
            log.info("Renaming labels")
            for label, _ in LABELS:
                new_label = prefix + label
                log.info("%r -> %r", label, new_label)
                client.patch(
                    f"/repos/{repo}/labels/{label}", json={"new_name": new_label}
                )
            log.info("Creating PR to update .autorc")
            subprocess.run(
                ["git", "checkout", "-b", PR_BRANCH, defbranch],
                check=True,
                cwd=d,
            )
            autorc = d / ".autorc"
            config = json.loads(autorc.read_text(encoding="utf-8"))
            config["labels"] = [
                {"name": name, "releaseType": rt} for (name, rt) in LABELS
            ]
            autorc.write_text(json.dumps(config, indent=4) + "\n", encoding="utf-8")
            subprocess.run(["git", "commit", "-m", title, ".autorc"], check=True, cwd=d)
            subprocess.run(
                ["git", "push", "--set-upstream", "origin", PR_BRANCH],
                check=True,
                cwd=d,
            )
            pr = client.post(
                f"/repos/{repo}/pulls",
                json={
                    "title": title,
                    "head": (
                        f"{head_owner}:{PR_BRANCH}"
                        if head_owner is not None
                        else PR_BRANCH
                    ),
                    "base": defbranch,
                    "body": body,
                    "maintainer_can_modify": True,
                },
            )
            log.info("PR created: %s", pr["url"])

if __name__ == "__main__":
    main()

Incidentally, I ran the script from my previous comment against the con, dandi, datalad, ReproNim, and duecredit organizations, and it listed the following repositories that use both auto and Dependabot:

con/fscacher
con/tinuous
dandi/dandi-archive
dandi/dandi-cli
dandi/dandi-schema
datalad/datalad-crawler
datalad/datalad-deprecated
datalad/datalad-fuse
datalad/datalad-installer
ReproNim/neurodocker
duecredit/duecredit
yarikoptic commented 2 months ago

Let's collect such scripts under https://github.com/con/scripts I just created . Feel welcome to choose directory hierarchy/structure and filenames the way you like it @jwodder .