exercism / org-wide-files

MIT License
7 stars 8 forks source link

Bug: Not all repos with topics are targetted by the workflow | [Tracking] Sync Tooling - push - ErikSchierboom - f28daf625fa326fafdcf261db7699377c8d1ed37 #225

Closed github-actions[bot] closed 2 years ago

github-actions[bot] commented 2 years ago

Workflow run: https://github.com/exercism/org-wide-files/actions/runs/2306860844

ee7 commented 2 years ago

Is it unexpected that this didn't open a PR in every tooling repo?

There wasn't one created at least for exercism/nim-test-runner: https://github.com/exercism/nim-test-runner/pulls?q=is%3Apr

exercism/nim-test-runner doesn't appear in the output of the "Fetch list of target repos" step: https://github.com/exercism/org-wide-files/runs/6402899031?check_suite_focus=true#step:2:40

https://github.com/exercism/org-wide-files/blob/3f01680a16365b4983f0ea33b4f55470416a656d/.github/workflows/sync-tooling.yml#L71-L87

SaschaMann commented 2 years ago

Is it unexpected that this didn't open a PR in every tooling repo?

Yes.

I can't replicate the issue when sending a manual request to the API.

ee7 commented 2 years ago

Getting the list of exercism repos with the exercism-tooling topic:

gh search repos --owner=exercism --topic=exercism-tooling --limit 200 | cut -f1,3 | sort

The repos that do have the topic, but the workflow did not target:

exercism/delphi-test-runner
exercism/DEPRECATED.rikki-ruby-analyzer
exercism/nim-test-runner
exercism/stub-analyzer

Of those:

So that leaves us with:

exercism/nim-test-runner

The gh command at the top of this post gives us the below table. For some reason, it was missing exactly:

at one point, even though those repos do have the topic...

Repo Notes
abap-test-runner public
bash-analyzer public
bash-test-runner public
cfml-test-runner public
clojure-analyzer public
clojure-representer public
clojurescript-test-runner public
clojure-test-runner public
coffeescript-test-runner public
common-lisp-analyzer public
common-lisp-representer public
common-lisp-test-runner public
cpp-test-runner public
c-representer public
c-representer public
crystal-test-runner public
crystal-test-runner public
csharp-analyzer public
csharp-representer public
csharp-test-runner public
c-test-runner public
dart-test-runner public
delphi-test-runner private
DEPRECATED.rikki-ruby-analyzer public, archived
d-test-runner public
elixir-analyzer public
elixir-representer public
elixir-test-runner public
elm-analyzer public
elm-representer public
elm-test-runner public
emacs-lisp-test-runner public
erlang-analyzer public
erlang-representer public
erlang-test-runner public
fortran-test-runner public
fsharp-analyzer public
fsharp-representer public
fsharp-test-runner public
generic-analyzer public
generic-representer public
generic-test-runner public
go-analyzer public
go-representer public
go-test-runner public
groovy-test-runner public
haskell-test-runner public
haxe-analyzer public
haxe-representer public
haxe-test-runner public
j-analyzer public
java-analyzer public
java-representer public
javascript-analyzer public
javascript-representer public
javascript-test-runner public
java-test-runner public
j-representer public
j-test-runner public
julia-analyzer public
julia-representer public
julia-test-runner public
kotlin-analyzer public
kotlin-test-runner public
lfe-test-runner public
lua-test-runner public
mips-test-runner public
nim-representer public
nim-test-runner public
objective-c-test-runner public
ocaml-test-runner public
perl5-analyzer public
perl5-test-runner public
pharo-smalltalk-test-runner public
php-test-runner public
purescript-test-runner public
python-analyzer public
python-representer public
python-test-runner public
racket-test-runner public
raku-test-runner public
reasonml-test-runner public
red-analyzer public
red-representer public
red-test-runner public
r-test-runner public
ruby-analyzer public
ruby-representer public
ruby-test-runner public
rust-analyzer public
rust-representer public
rust-test-runner public
scala-analyzer public
scala-test-runner public
scheme-test-runner public
sml-test-runner public
stub-analyzer public, archived
swift-test-runner public
tcl-test-runner public
typescript-analyzer public
typescript-representer public
typescript-test-runner public
vbnet-test-runner public
vimscript-test-runner public
wasm-test-runner public
wren-representer public
wren-test-runner public
x86-64-assembly-test-runner public
zig-test-runner public
ee7 commented 2 years ago

To others: Sascha manually triggered https://github.com/exercism/nim-test-runner/pull/138

SaschaMann commented 2 years ago

That's really strange. Perhaps fetching the full repo object and comparing them between the listed and unlisted repos could show some differences that explain this?

This might be an issue with the GH search itself, too.

(Sorry for the less than helpful comments, can't look into it properly atm due to PC issues)

SleeplessByte commented 2 years ago

Maybe the problem is that when the tool does this it's paginated and only looks at the first page?

SaschaMann commented 2 years ago

That was my initial thought too, but the call to github.paginate should take care of it.

Sending the same query via curl shows that we should not be hitting the rate limits either:

$ curl \
>   -H "Accept: application/vnd.github.v3+json" \
>   https://api.github.com/search/repositories?q=org:exercism+topic:exercism-tooling+is:public+archived:false
{
  "total_count": 106,
  "incomplete_results": false,
  "items": [
iHiD commented 2 years ago

Could you retry and see if the rust-test-runner works now pls?

ee7 commented 2 years ago

Could you retry and see if the rust-test-runner works now pls?

The below command outputs 109 repo names, and includes rust-test-runner.

I think I was seeing fewer at one point. But it might have been an intermittent rate-limiting or pagination thing. And --limit 200 might be optimistic for gh search repos, but it isn't immediately documented that it won't work.

Did you change something?

$ gh search repos --owner=exercism --topic=exercism-tooling --limit 200 | cut -f1 | sort
exercism/abap-test-runner
exercism/bash-analyzer
exercism/bash-test-runner
exercism/cfml-test-runner
exercism/clojure-analyzer
exercism/clojure-representer
exercism/clojurescript-test-runner
exercism/clojure-test-runner
exercism/coffeescript-test-runner
exercism/common-lisp-analyzer
exercism/common-lisp-representer
exercism/common-lisp-test-runner
exercism/cpp-test-runner
exercism/c-representer
exercism/crystal-test-runner
exercism/csharp-analyzer
exercism/csharp-representer
exercism/csharp-test-runner
exercism/c-test-runner
exercism/dart-test-runner
exercism/delphi-test-runner
exercism/DEPRECATED.rikki-ruby-analyzer
exercism/d-test-runner
exercism/elixir-analyzer
exercism/elixir-representer
exercism/elixir-test-runner
exercism/elm-analyzer
exercism/elm-representer
exercism/elm-test-runner
exercism/emacs-lisp-test-runner
exercism/erlang-analyzer
exercism/erlang-representer
exercism/erlang-test-runner
exercism/fortran-test-runner
exercism/fsharp-analyzer
exercism/fsharp-representer
exercism/fsharp-test-runner
exercism/generic-analyzer
exercism/generic-representer
exercism/generic-test-runner
exercism/go-analyzer
exercism/go-representer
exercism/go-test-runner
exercism/groovy-test-runner
exercism/haskell-test-runner
exercism/haxe-analyzer
exercism/haxe-representer
exercism/haxe-test-runner
exercism/j-analyzer
exercism/java-analyzer
exercism/java-representer
exercism/javascript-analyzer
exercism/javascript-representer
exercism/javascript-test-runner
exercism/java-test-runner
exercism/j-representer
exercism/j-test-runner
exercism/julia-analyzer
exercism/julia-representer
exercism/julia-test-runner
exercism/kotlin-analyzer
exercism/kotlin-test-runner
exercism/lfe-test-runner
exercism/lua-test-runner
exercism/mips-test-runner
exercism/nim-representer
exercism/nim-test-runner
exercism/objective-c-test-runner
exercism/ocaml-test-runner
exercism/perl5-analyzer
exercism/perl5-test-runner
exercism/pharo-smalltalk-test-runner
exercism/php-test-runner
exercism/plsql-test-runner
exercism/prolog-test-runner
exercism/purescript-test-runner
exercism/python-analyzer
exercism/python-representer
exercism/python-test-runner
exercism/racket-test-runner
exercism/raku-test-runner
exercism/reasonml-test-runner
exercism/red-analyzer
exercism/red-representer
exercism/red-test-runner
exercism/r-test-runner
exercism/ruby-analyzer
exercism/ruby-representer
exercism/ruby-test-runner
exercism/rust-analyzer
exercism/rust-representer
exercism/rust-test-runner
exercism/scala-analyzer
exercism/scala-test-runner
exercism/scheme-test-runner
exercism/sml-test-runner
exercism/stub-analyzer
exercism/swift-test-runner
exercism/tcl-test-runner
exercism/typescript-analyzer
exercism/typescript-representer
exercism/typescript-test-runner
exercism/vbnet-test-runner
exercism/vimscript-test-runner
exercism/wasm-test-runner
exercism/wren-representer
exercism/wren-test-runner
exercism/x86-64-assembly-test-runner
exercism/zig-test-runner
iHiD commented 2 years ago

I removed leading white space from the rust-test-runners topic. But I don't know if I actually did, because in most of the UI that doesn't show, but on one specific bit it did. So I'm wondering if it's strip'd in most places, but not in some.

SaschaMann commented 2 years ago

@ee7 could you submit a PR that adds your whitespace checks to GitHub please? :D

ee7 commented 2 years ago

I removed leading white space from the rust-test-runners topic.

I see.

By the way, is it intended that https://github.com/exercism/delphi-test-runner is private? As far as I can see, it's the only private exercism repo:

$ gh repo list exercism --visibility private

Showing 1 of 1 repositories in @exercism that match your search

exercism/delphi-test-runner    private  Oct  5, 2021

(It's probably also better to use gh repo list exercism than gh search repos --owner=exercism).

iHiD commented 2 years ago

By the way, is it intended that https://github.com/exercism/delphi-test-runner is private? As far as I can see, it's the only private exercism repo:

Yes. It contains delphi's source code, which is proprietory (delphi is our only non open-source language AFAICT)

ee7 commented 2 years ago

Yes. It contains delphi's source code

Ah.

@ee7 could you submit a PR that adds your whitespace checks to GitHub please? :D

@SaschaMann You jest, but I've actually been considering a repo that runs periodic checks on Exercism repo metadata. Then we can get notifications for things like:

I suppose it could even be a periodic workflow in exercism/configlet.

iHiD commented 2 years ago

(cc @kytrinyx to this too)

SaschaMann commented 2 years ago

If you're gonna implement that, please add "repo is empty" as that has caused us issues a few times already :)

ee7 commented 2 years ago

please add "repo is empty"

You'd like a failing check when any Exercism repo has isEmpty as true in the below?

The 5 repos with the lowest disk usage currently:

gh repo list exercism \
    --limit 500 --no-archived --visibility public --json diskUsage,isEmpty,name \
    | jq 'sort_by(.diskUsage) | .[0:5]'
[
  {
    "diskUsage": 6,
    "isEmpty": false,
    "name": "babel-preset-javascript"
  },
  {
    "diskUsage": 11,
    "isEmpty": false,
    "name": "guys-slack-bot"
  },
  {
    "diskUsage": 12,
    "isEmpty": false,
    "name": "open-source-stats"
  },
  {
    "diskUsage": 12,
    "isEmpty": false,
    "name": "backlog"
  },
  {
    "diskUsage": 13,
    "isEmpty": false,
    "name": "wip"
  }
]

(https://github.com/exercism/babel-preset-javascript is quite small: CODE_OF_CONDUCT.md and labels).

that has caused us issues a few times already :)

https://github.com/exercism/org-wide-files/issues/110? Can we filter out empty repos before we fork?

SaschaMann commented 2 years ago

https://github.com/exercism/org-wide-files/issues/110? Can we filter out empty repos before we fork?

Yes but I'd argue that the org-wide-files should be synced to those repos as well and that a repo should generally not be empty, even if it only has a dummy commit. It seems like an easier fix to not have empty repos than to add logic to this and the webhooks to handle those situations.

ErikSchierboom commented 2 years ago

Yes but I'd argue that the org-wide-files should be synced to those repos as well and that a repo should generally not be empty, even if it only has a dummy commit. It seems like an easier fix to not have empty repos than to add logic to this and the webhooks to handle those situations.

Agreed. At the least, a repo should have a README.md file

ErikSchierboom commented 2 years ago

The below command outputs 109 repo names, and includes rust-test-runner.

Do we have any open issues remaining regarding not all repos being targeted?

ee7 commented 2 years ago

Yes but I'd argue that the org-wide-files should be synced to those repos as well and that a repo should generally not be empty, even if it only has a dummy commit. It seems like an easier fix to not have empty repos than to add logic to this and the webhooks to handle those situations.

Agreed. At the least, a repo should have a README.md file

I'm more persuaded by the counter-argument: it seems easier to be robust to the existence of an empty repo, rather than being blocked by one and needing manual intervention (so org-wide-files jobs fail until we remove, or add content to, an empty repo that is encountered). A new repo might be empty for a short period while it's being set up, and it'd be easy to forget that it would break org-wide-files jobs. Creating an empty repo might be convenient for someone while setting something up, and the fact that it has happened before means that it can happen in the future.

I could also claim that an empty repo does not do anything yet, and so does not urgently need to be targeted by the next org-wide-files PR. Furthermore, an empty repo might not yet have the correct topics set, so org-wide-files cannot sync the correct files anyway.

We could have a periodic job that runs e.g. daily, and does things like producing an error if there is an empty repo. But that doesn't help if an empty repo is created shortly before an org-wide-files job runs, and making the "exercism repo linting" job prevent an org-wide-files job is even harder than filtering empty repos, and produces the same outcome - org-wide-files being blocked.

But I probably don't have the full picture, and I'm not the one who has to deal with it. Do whatever's easiest, of course.

ee7 commented 2 years ago

The below command outputs 109 repo names, and includes rust-test-runner.

Do we have any open issues remaining regarding not all repos being targeted?

I think only this issue. I believe org-wide-files has now targeted every repo we wanted - it was only nim-test-runner that was not targeted in this case.

It's strange that nim-test-runner was missing, though.

It does have the topic:

gh repo view exercism/nim-test-runner --json repositoryTopics
{
  "repositoryTopics": [
    {
      "name": "exercism-tooling"
    },
    {
      "name": "exercism-test-runner"
    }
  ]
}

In case it's a weird rate-limiting problem, maybe we could try using GITHUB_TOKEN to make an authenticated request?

And/or switch to gh (or curl), if we're suspicious that it might perform better than actions/github-script?

Getting all the Exercism-owned, public, non-archived, repos with the exercism-tooling topic with gh:

gh repo list exercism \
    --visibility public \
    --no-archived \
    --limit 500 \
    --topic exercism-tooling \
    --json name \
    --jq 'sort_by(.name)' \
    | jq '.[:3]' 

(The final pipe to jq is just to slice to keep the output short below).

[
  {
    "name": "abap-test-runner"
  },
  {
    "name": "bash-analyzer"
  },
  {
    "name": "bash-test-runner"
  }
]

I think this avoids the search API, because if I write --limit 1234 I don't see this message: https://github.com/cli/cli/blob/9454ad376953/pkg/cmd/repo/list/list.go#L181-L182

ErikSchierboom commented 2 years ago

In case it's a weird rate-limiting problem, maybe we could try using GITHUB_TOKEN to make an authenticated request?

I think it already does that: https://github.com/actions/github-script#actionsgithub-script

A pre-authenticated octokit/rest.js client with pagination plugins

SaschaMann commented 2 years ago

I don't have a particular preference for github-script vs gh for this particular use case. Though I'm not sure if this would be as readable in Bash. These lines would need to be rewritten: https://github.com/exercism/org-wide-files/blob/cd205480e83eb23769696e0ad553ad53d5b94558/.github/workflows/sync-rest.yml#L78-L97

Not relying on the search API seems like a good idea for resilience.

ErikSchierboom commented 2 years ago

But then what does it use? The graphql API? Something special?

SaschaMann commented 2 years ago

I didn't look up what the gh command does but there's an endpoint that returns all repositories in an organisation. You could then filter these based on topics afterwards.

ErikSchierboom commented 2 years ago

I think that is what they do: https://github.com/cli/cli/blob/00e97121ec8cf4de86bdda704fb711302a8dfc96/pkg/cmd/repo/list/http.go#L80

ee7 commented 2 years ago

I think the source of truth is https://github.com/cli/cli/blob/00e97121ec8c/pkg/cmd/repo/list/http.go#L32, but I don't understand exactly what I'm seeing. It looks like both listRepos and searchRepos use the GraphQL API v4, and FromSearch is set to true if e.g. we filter non-archived or by topic.

(Edit: Erik beat me by 7 seconds).

ErikSchierboom commented 2 years ago

If the graphql search API is used, I'd assume that also has rate limits, so are we gaining anything by not using github script?

ee7 commented 2 years ago

Aha, I've worked it out.

If I write

gh repo list exercism --no-archived --topic exercism-tooling --limit 1234

Then the output does start with

warning: this query uses the Search API which is capped at 1000 results maximum

But the warning does not appear if I add some --json option:

gh repo list exercism --no-archived --topic exercism-tooling --limit 1234 --json name

I think they could still print that warning on stderr (which is where it appears when --json is not used).

ee7 commented 2 years ago

If the graphql search API is used, I'd assume that also has rate limits, so are we gaining anything by not using github script?

I don't know. But the GraphQL API and the REST API do have different limits: https://docs.github.com/en/graphql/overview/resource-limitations#rate-limit

Maybe let's do nothing for now, and keep an eye on the number of repos targeted by an org-wide-files run? Or we could e.g. obtain the repo list by two independent methods, and require them to agree.