Bug: Not all repos with topics are targetted by the workflow | [Tracking] Sync Tooling - push - ErikSchierboom - f28daf625fa326fafdcf261db7699377c8d1ed37

github-actions[bot] commented 2 years ago

Workflow run: https://github.com/exercism/org-wide-files/actions/runs/2306860844

ee7 commented 2 years ago

Is it unexpected that this didn't open a PR in every tooling repo?

There wasn't one created at least for exercism/nim-test-runner: https://github.com/exercism/nim-test-runner/pulls?q=is%3Apr

exercism/nim-test-runner doesn't appear in the output of the "Fetch list of target repos" step: https://github.com/exercism/org-wide-files/runs/6402899031?check_suite_focus=true#step:2:40

https://github.com/exercism/org-wide-files/blob/3f01680a16365b4983f0ea33b4f55470416a656d/.github/workflows/sync-tooling.yml#L71-L87

SaschaMann commented 2 years ago

Is it unexpected that this didn't open a PR in every tooling repo?

Yes.

I can't replicate the issue when sending a manual request to the API.

ee7 commented 2 years ago

Getting the list of exercism repos with the exercism-tooling topic:

gh search repos --owner=exercism --topic=exercism-tooling --limit 200 | cut -f1,3 | sort

The repos that do have the topic, but the workflow did not target:

exercism/delphi-test-runner
exercism/DEPRECATED.rikki-ruby-analyzer
exercism/nim-test-runner
exercism/stub-analyzer

Of those:

exercism/DEPRECATED.rikki-ruby-analyzer and exercism/stub-analyzer are archived
exercism/delphi-test-runner is private.

So that leaves us with:

exercism/nim-test-runner

The gh command at the top of this post gives us the below table. For some reason, it was missing exactly:

exercism/plsql-test-runner
exercism/prolog-test-runner
exercism/rust-test-runner

at one point, even though those repos do have the topic...

Repo	Notes
abap-test-runner	public
bash-analyzer	public
bash-test-runner	public
cfml-test-runner	public
clojure-analyzer	public
clojure-representer	public
clojurescript-test-runner	public
clojure-test-runner	public
coffeescript-test-runner	public
common-lisp-analyzer	public
common-lisp-representer	public
common-lisp-test-runner	public
cpp-test-runner	public
c-representer	public
c-representer	public
crystal-test-runner	public
crystal-test-runner	public
csharp-analyzer	public
csharp-representer	public
csharp-test-runner	public
c-test-runner	public
dart-test-runner	public
delphi-test-runner	private
DEPRECATED.rikki-ruby-analyzer	public, archived
d-test-runner	public
elixir-analyzer	public
elixir-representer	public
elixir-test-runner	public
elm-analyzer	public
elm-representer	public
elm-test-runner	public
emacs-lisp-test-runner	public
erlang-analyzer	public
erlang-representer	public
erlang-test-runner	public
fortran-test-runner	public
fsharp-analyzer	public
fsharp-representer	public
fsharp-test-runner	public
generic-analyzer	public
generic-representer	public
generic-test-runner	public
go-analyzer	public
go-representer	public
go-test-runner	public
groovy-test-runner	public
haskell-test-runner	public
haxe-analyzer	public
haxe-representer	public
haxe-test-runner	public
j-analyzer	public
java-analyzer	public
java-representer	public
javascript-analyzer	public
javascript-representer	public
javascript-test-runner	public
java-test-runner	public
j-representer	public
j-test-runner	public
julia-analyzer	public
julia-representer	public
julia-test-runner	public
kotlin-analyzer	public
kotlin-test-runner	public
lfe-test-runner	public
lua-test-runner	public
mips-test-runner	public
nim-representer	public
nim-test-runner	public
objective-c-test-runner	public
ocaml-test-runner	public
perl5-analyzer	public
perl5-test-runner	public
pharo-smalltalk-test-runner	public
php-test-runner	public
purescript-test-runner	public
python-analyzer	public
python-representer	public
python-test-runner	public
racket-test-runner	public
raku-test-runner	public
reasonml-test-runner	public
red-analyzer	public
red-representer	public
red-test-runner	public
r-test-runner	public
ruby-analyzer	public
ruby-representer	public
ruby-test-runner	public
rust-analyzer	public
rust-representer	public
rust-test-runner	public
scala-analyzer	public
scala-test-runner	public
scheme-test-runner	public
sml-test-runner	public
stub-analyzer	public, archived
swift-test-runner	public
tcl-test-runner	public
typescript-analyzer	public
typescript-representer	public
typescript-test-runner	public
vbnet-test-runner	public
vimscript-test-runner	public
wasm-test-runner	public
wren-representer	public
wren-test-runner	public
x86-64-assembly-test-runner	public
zig-test-runner	public

ee7 commented 2 years ago

To others: Sascha manually triggered https://github.com/exercism/nim-test-runner/pull/138

SaschaMann commented 2 years ago

That's really strange. Perhaps fetching the full repo object and comparing them between the listed and unlisted repos could show some differences that explain this?

This might be an issue with the GH search itself, too.

(Sorry for the less than helpful comments, can't look into it properly atm due to PC issues)

SleeplessByte commented 2 years ago

Maybe the problem is that when the tool does this it's paginated and only looks at the first page?

SaschaMann commented 2 years ago

That was my initial thought too, but the call to github.paginate should take care of it.

Sending the same query via curl shows that we should not be hitting the rate limits either:

$ curl \
>   -H "Accept: application/vnd.github.v3+json" \
>   https://api.github.com/search/repositories?q=org:exercism+topic:exercism-tooling+is:public+archived:false
{
  "total_count": 106,
  "incomplete_results": false,
  "items": [

iHiD commented 2 years ago

Could you retry and see if the rust-test-runner works now pls?

ee7 commented 2 years ago

Could you retry and see if the rust-test-runner works now pls?

The below command outputs 109 repo names, and includes rust-test-runner.

I think I was seeing fewer at one point. But it might have been an intermittent rate-limiting or pagination thing. And --limit 200 might be optimistic for gh search repos, but it isn't immediately documented that it won't work.

Did you change something?

$ gh search repos --owner=exercism --topic=exercism-tooling --limit 200 | cut -f1 | sort
exercism/abap-test-runner
exercism/bash-analyzer
exercism/bash-test-runner
exercism/cfml-test-runner
exercism/clojure-analyzer
exercism/clojure-representer
exercism/clojurescript-test-runner
exercism/clojure-test-runner
exercism/coffeescript-test-runner
exercism/common-lisp-analyzer
exercism/common-lisp-representer
exercism/common-lisp-test-runner
exercism/cpp-test-runner
exercism/c-representer
exercism/crystal-test-runner
exercism/csharp-analyzer
exercism/csharp-representer
exercism/csharp-test-runner
exercism/c-test-runner
exercism/dart-test-runner
exercism/delphi-test-runner
exercism/DEPRECATED.rikki-ruby-analyzer
exercism/d-test-runner
exercism/elixir-analyzer
exercism/elixir-representer
exercism/elixir-test-runner
exercism/elm-analyzer
exercism/elm-representer
exercism/elm-test-runner
exercism/emacs-lisp-test-runner
exercism/erlang-analyzer
exercism/erlang-representer
exercism/erlang-test-runner
exercism/fortran-test-runner
exercism/fsharp-analyzer
exercism/fsharp-representer
exercism/fsharp-test-runner
exercism/generic-analyzer
exercism/generic-representer
exercism/generic-test-runner
exercism/go-analyzer
exercism/go-representer
exercism/go-test-runner
exercism/groovy-test-runner
exercism/haskell-test-runner
exercism/haxe-analyzer
exercism/haxe-representer
exercism/haxe-test-runner
exercism/j-analyzer
exercism/java-analyzer
exercism/java-representer
exercism/javascript-analyzer
exercism/javascript-representer
exercism/javascript-test-runner
exercism/java-test-runner
exercism/j-representer
exercism/j-test-runner
exercism/julia-analyzer
exercism/julia-representer
exercism/julia-test-runner
exercism/kotlin-analyzer
exercism/kotlin-test-runner
exercism/lfe-test-runner
exercism/lua-test-runner
exercism/mips-test-runner
exercism/nim-representer
exercism/nim-test-runner
exercism/objective-c-test-runner
exercism/ocaml-test-runner
exercism/perl5-analyzer
exercism/perl5-test-runner
exercism/pharo-smalltalk-test-runner
exercism/php-test-runner
exercism/plsql-test-runner
exercism/prolog-test-runner
exercism/purescript-test-runner
exercism/python-analyzer
exercism/python-representer
exercism/python-test-runner
exercism/racket-test-runner
exercism/raku-test-runner
exercism/reasonml-test-runner
exercism/red-analyzer
exercism/red-representer
exercism/red-test-runner
exercism/r-test-runner
exercism/ruby-analyzer
exercism/ruby-representer
exercism/ruby-test-runner
exercism/rust-analyzer
exercism/rust-representer
exercism/rust-test-runner
exercism/scala-analyzer
exercism/scala-test-runner
exercism/scheme-test-runner
exercism/sml-test-runner
exercism/stub-analyzer
exercism/swift-test-runner
exercism/tcl-test-runner
exercism/typescript-analyzer
exercism/typescript-representer
exercism/typescript-test-runner
exercism/vbnet-test-runner
exercism/vimscript-test-runner
exercism/wasm-test-runner
exercism/wren-representer
exercism/wren-test-runner
exercism/x86-64-assembly-test-runner
exercism/zig-test-runner

iHiD commented 2 years ago

I removed leading white space from the rust-test-runners topic. But I don't know if I actually did, because in most of the UI that doesn't show, but on one specific bit it did. So I'm wondering if it's strip'd in most places, but not in some.

SaschaMann commented 2 years ago

@ee7 could you submit a PR that adds your whitespace checks to GitHub please? :D

ee7 commented 2 years ago

I removed leading white space from the rust-test-runners topic.

I see.

By the way, is it intended that https://github.com/exercism/delphi-test-runner is private? As far as I can see, it's the only private exercism repo:

$ gh repo list exercism --visibility private

Showing 1 of 1 repositories in @exercism that match your search

exercism/delphi-test-runner    private  Oct  5, 2021

(It's probably also better to use gh repo list exercism than gh search repos --owner=exercism).

iHiD commented 2 years ago

By the way, is it intended that https://github.com/exercism/delphi-test-runner is private? As far as I can see, it's the only private exercism repo:

Yes. It contains delphi's source code, which is proprietory (delphi is our only non open-source language AFAICT)

ee7 commented 2 years ago

Yes. It contains delphi's source code

Ah.

@ee7 could you submit a PR that adds your whitespace checks to GitHub please? :D

@SaschaMann You jest, but I've actually been considering a repo that runs periodic checks on Exercism repo metadata. Then we can get notifications for things like:

I suppose it could even be a periodic workflow in exercism/configlet.

iHiD commented 2 years ago

(cc @kytrinyx to this too)

SaschaMann commented 2 years ago

If you're gonna implement that, please add "repo is empty" as that has caused us issues a few times already :)

ee7 commented 2 years ago

please add "repo is empty"

You'd like a failing check when any Exercism repo has isEmpty as true in the below?

The 5 repos with the lowest disk usage currently:

gh repo list exercism \
    --limit 500 --no-archived --visibility public --json diskUsage,isEmpty,name \
    | jq 'sort_by(.diskUsage) | .[0:5]'

[
  {
    "diskUsage": 6,
    "isEmpty": false,
    "name": "babel-preset-javascript"
  },
  {
    "diskUsage": 11,
    "isEmpty": false,
    "name": "guys-slack-bot"
  },
  {
    "diskUsage": 12,
    "isEmpty": false,
    "name": "open-source-stats"
  },
  {
    "diskUsage": 12,
    "isEmpty": false,
    "name": "backlog"
  },
  {
    "diskUsage": 13,
    "isEmpty": false,
    "name": "wip"
  }
]

(https://github.com/exercism/babel-preset-javascript is quite small: CODE_OF_CONDUCT.md and labels).

that has caused us issues a few times already :)

https://github.com/exercism/org-wide-files/issues/110? Can we filter out empty repos before we fork?

SaschaMann commented 2 years ago

https://github.com/exercism/org-wide-files/issues/110? Can we filter out empty repos before we fork?

Yes but I'd argue that the org-wide-files should be synced to those repos as well and that a repo should generally not be empty, even if it only has a dummy commit. It seems like an easier fix to not have empty repos than to add logic to this and the webhooks to handle those situations.

ErikSchierboom commented 2 years ago

Yes but I'd argue that the org-wide-files should be synced to those repos as well and that a repo should generally not be empty, even if it only has a dummy commit. It seems like an easier fix to not have empty repos than to add logic to this and the webhooks to handle those situations.

Agreed. At the least, a repo should have a README.md file

ErikSchierboom commented 2 years ago

The below command outputs 109 repo names, and includes rust-test-runner.

Do we have any open issues remaining regarding not all repos being targeted?

ee7 commented 2 years ago

Yes but I'd argue that the org-wide-files should be synced to those repos as well and that a repo should generally not be empty, even if it only has a dummy commit. It seems like an easier fix to not have empty repos than to add logic to this and the webhooks to handle those situations.

Agreed. At the least, a repo should have a README.md file

I'm more persuaded by the counter-argument: it seems easier to be robust to the existence of an empty repo, rather than being blocked by one and needing manual intervention (so org-wide-files jobs fail until we remove, or add content to, an empty repo that is encountered). A new repo might be empty for a short period while it's being set up, and it'd be easy to forget that it would break org-wide-files jobs. Creating an empty repo might be convenient for someone while setting something up, and the fact that it has happened before means that it can happen in the future.

I could also claim that an empty repo does not do anything yet, and so does not urgently need to be targeted by the next org-wide-files PR. Furthermore, an empty repo might not yet have the correct topics set, so org-wide-files cannot sync the correct files anyway.

We could have a periodic job that runs e.g. daily, and does things like producing an error if there is an empty repo. But that doesn't help if an empty repo is created shortly before an org-wide-files job runs, and making the "exercism repo linting" job prevent an org-wide-files job is even harder than filtering empty repos, and produces the same outcome - org-wide-files being blocked.

But I probably don't have the full picture, and I'm not the one who has to deal with it. Do whatever's easiest, of course.

ee7 commented 2 years ago

The below command outputs 109 repo names, and includes rust-test-runner.

Do we have any open issues remaining regarding not all repos being targeted?

I think only this issue. I believe org-wide-files has now targeted every repo we wanted - it was only nim-test-runner that was not targeted in this case.

It's strange that nim-test-runner was missing, though.

It does have the topic:

gh repo view exercism/nim-test-runner --json repositoryTopics

{
  "repositoryTopics": [
    {
      "name": "exercism-tooling"
    },
    {
      "name": "exercism-test-runner"
    }
  ]
}

In case it's a weird rate-limiting problem, maybe we could try using GITHUB_TOKEN to make an authenticated request?

And/or switch to gh (or curl), if we're suspicious that it might perform better than actions/github-script?

Getting all the Exercism-owned, public, non-archived, repos with the exercism-tooling topic with gh:

gh repo list exercism \
    --visibility public \
    --no-archived \
    --limit 500 \
    --topic exercism-tooling \
    --json name \
    --jq 'sort_by(.name)' \
    | jq '.[:3]'

(The final pipe to jq is just to slice to keep the output short below).

[
  {
    "name": "abap-test-runner"
  },
  {
    "name": "bash-analyzer"
  },
  {
    "name": "bash-test-runner"
  }
]

I think this avoids the search API, because if I write --limit 1234 I don't see this message: https://github.com/cli/cli/blob/9454ad376953/pkg/cmd/repo/list/list.go#L181-L182

ErikSchierboom commented 2 years ago

In case it's a weird rate-limiting problem, maybe we could try using GITHUB_TOKEN to make an authenticated request?

I think it already does that: https://github.com/actions/github-script#actionsgithub-script

A pre-authenticated octokit/rest.js client with pagination plugins

SaschaMann commented 2 years ago

I don't have a particular preference for github-script vs gh for this particular use case. Though I'm not sure if this would be as readable in Bash. These lines would need to be rewritten: https://github.com/exercism/org-wide-files/blob/cd205480e83eb23769696e0ad553ad53d5b94558/.github/workflows/sync-rest.yml#L78-L97

Not relying on the search API seems like a good idea for resilience.

ErikSchierboom commented 2 years ago

But then what does it use? The graphql API? Something special?

SaschaMann commented 2 years ago

I didn't look up what the gh command does but there's an endpoint that returns all repositories in an organisation. You could then filter these based on topics afterwards.

ErikSchierboom commented 2 years ago

I think that is what they do: https://github.com/cli/cli/blob/00e97121ec8cf4de86bdda704fb711302a8dfc96/pkg/cmd/repo/list/http.go#L80

ee7 commented 2 years ago

I think the source of truth is https://github.com/cli/cli/blob/00e97121ec8c/pkg/cmd/repo/list/http.go#L32, but I don't understand exactly what I'm seeing. It looks like both listRepos and searchRepos use the GraphQL API v4, and FromSearch is set to true if e.g. we filter non-archived or by topic.

(Edit: Erik beat me by 7 seconds).

ErikSchierboom commented 2 years ago

If the graphql search API is used, I'd assume that also has rate limits, so are we gaining anything by not using github script?

ee7 commented 2 years ago

Aha, I've worked it out.

If I write

gh repo list exercism --no-archived --topic exercism-tooling --limit 1234

Then the output does start with

warning: this query uses the Search API which is capped at 1000 results maximum

But the warning does not appear if I add some --json option:

gh repo list exercism --no-archived --topic exercism-tooling --limit 1234 --json name

I think they could still print that warning on stderr (which is where it appears when --json is not used).

ee7 commented 2 years ago

If the graphql search API is used, I'd assume that also has rate limits, so are we gaining anything by not using github script?

I don't know. But the GraphQL API and the REST API do have different limits: https://docs.github.com/en/graphql/overview/resource-limitations#rate-limit

Maybe let's do nothing for now, and keep an eye on the number of repos targeted by an org-wide-files run? Or we could e.g. obtain the repo list by two independent methods, and require them to agree.

exercism / org-wide-files

Bug: Not all repos with topics are targetted by the workflow | [Tracking] Sync Tooling - push - ErikSchierboom - f28daf625fa326fafdcf261db7699377c8d1ed37 #225