NerdWalletOSS / shepherd

A utility for applying code changes across many repositories.
Apache License 2.0
230 stars 39 forks source link

Additional filters before checkout repo #92

Open elicwhite opened 4 years ago

elicwhite commented 4 years ago

I'd like to be able to run some lightweight filters that can't be encoded in the GitHub search API before checking out the repo to do heavier weight repos.

For example, my search results are matching repos that has node_modules checked in and the query is matching files inside of node_modules.

For example, a url like this: riphunter07/ATproject/ATproject/node_modules.old/react-native/jest/setup.js.

I could encode NOT node_modules in my query, but I think that would exclude modules with:

const foo = require('./node_modules/foo/foo.js');

Github lets you do searches with file names but I don't think that lets you do negative file name searches:

Libraries/Renderer/shims/ReactNative fork:false language:JavaScript filename: NOT native_modules NOT flow-typed

Being able to do this early filtering (on paths, or even other things that GitHub returns as part of the API search results) would let me avoid having to clone hundreds of repos that are using React Native to realize my search is finding code in react-native core in each of these projects.

parshap commented 4 years ago

You might be able to accomplish what you want with -path:node_modules (see docs on exclusion), but GitHub search is not exactly the most reliable or robust thing, so I think the type of functionality you’re talking about makes sense. At NerdWallet, we’ve solved this using the should_migrate hook, but we’re only ever working with ~100 repos so the cost of cloning is ok. Being able to do a pre-clone filter makes sense for your type of use case.

Do you have a proposal in mind for how this would work?

elicwhite commented 4 years ago

I’m not sure how it would work. The simple case of filtering the file path is relatively straight forward. You could even make that an env variable that could be passed to a script or just grep directly.

However, I imagine there is some other data returned from the github api that might be nice to filter on, although I’m not sure what that is. In that case I’d love to process it with a script. Maybe you could json stringify it and call a script with that as stdin, or just stick it in an env var too :-D

parshap commented 4 years ago

I like the serialized json over stdin idea. I think that might be the most robust, but maybe not the most ergonomic to script against. If there’s a limited number of basic data (strings), env vars might make more sense.

I think the next step would be to see what context is available at that time that would be relevant to filtering (gh api response, anything else?) and go from there.

elicwhite commented 4 years ago

That sounds like a great plan to me! Do you happen to have any bandwidth or interest to work on this? Otherwise I’ll take a stab when I need it.

Edit: it was silly for me to ask that. I’ll dig into this myself when I work on this next

elicwhite commented 4 years ago

This is an example element of the array of the JSON response for a search request:

{
  "name": "ProgressViewIOS.ios.js",
  "path": "Libraries/Components/ProgressViewIOS/ProgressViewIOS.ios.js",
  "sha": "76a1f5efb5294638826265b6a4009e6419da1fd4",
  "url": "https://api.github.com/repositories/214770447/contents/Libraries/Components/ProgressViewIOS/ProgressViewIOS.ios.js?ref=d5e5d39c2dd9822af07448ff47e44ed1df44d8e7",
  "git_url": "https://api.github.com/repositories/214770447/git/blobs/76a1f5efb5294638826265b6a4009e6419da1fd4",
  "html_url": "https://github.com/yinhangfeng/react-native-web/blob/d5e5d39c2dd9822af07448ff47e44ed1df44d8e7/Libraries/Components/ProgressViewIOS/ProgressViewIOS.ios.js",
  "repository": {
    "id": 214770447,
    "node_id": "MDEwOlJlcG9zaXRvcnkyMTQ3NzA0NDc=",
    "name": "react-native-web",
    "full_name": "yinhangfeng/react-native-web",
    "private": false,
    "owner": {
      "login": "yinhangfeng",
      "id": 2867940,
      "node_id": "MDQ6VXNlcjI4Njc5NDA=",
      "avatar_url": "https://avatars0.githubusercontent.com/u/2867940?v=4",
      "gravatar_id": "",
      "url": "https://api.github.com/users/yinhangfeng",
      "html_url": "https://github.com/yinhangfeng",
      "followers_url": "https://api.github.com/users/yinhangfeng/followers",
      "following_url": "https://api.github.com/users/yinhangfeng/following{/other_user}",
      "gists_url": "https://api.github.com/users/yinhangfeng/gists{/gist_id}",
      "starred_url": "https://api.github.com/users/yinhangfeng/starred{/owner}{/repo}",
      "subscriptions_url": "https://api.github.com/users/yinhangfeng/subscriptions",
      "organizations_url": "https://api.github.com/users/yinhangfeng/orgs",
      "repos_url": "https://api.github.com/users/yinhangfeng/repos",
      "events_url": "https://api.github.com/users/yinhangfeng/events{/privacy}",
      "received_events_url": "https://api.github.com/users/yinhangfeng/received_events",
      "type": "User",
      "site_admin": false
    },
    "html_url": "https://github.com/yinhangfeng/react-native-web",
    "description": null,
    "fork": false,
    "url": "https://api.github.com/repos/yinhangfeng/react-native-web",
    "forks_url": "https://api.github.com/repos/yinhangfeng/react-native-web/forks",
    "keys_url": "https://api.github.com/repos/yinhangfeng/react-native-web/keys{/key_id}",
    "collaborators_url": "https://api.github.com/repos/yinhangfeng/react-native-web/collaborators{/collaborator}",
    "teams_url": "https://api.github.com/repos/yinhangfeng/react-native-web/teams",
    "hooks_url": "https://api.github.com/repos/yinhangfeng/react-native-web/hooks",
    "issue_events_url": "https://api.github.com/repos/yinhangfeng/react-native-web/issues/events{/number}",
    "events_url": "https://api.github.com/repos/yinhangfeng/react-native-web/events",
    "assignees_url": "https://api.github.com/repos/yinhangfeng/react-native-web/assignees{/user}",
    "branches_url": "https://api.github.com/repos/yinhangfeng/react-native-web/branches{/branch}",
    "tags_url": "https://api.github.com/repos/yinhangfeng/react-native-web/tags",
    "blobs_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/blobs{/sha}",
    "git_tags_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/tags{/sha}",
    "git_refs_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/refs{/sha}",
    "trees_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/trees{/sha}",
    "statuses_url": "https://api.github.com/repos/yinhangfeng/react-native-web/statuses/{sha}",
    "languages_url": "https://api.github.com/repos/yinhangfeng/react-native-web/languages",
    "stargazers_url": "https://api.github.com/repos/yinhangfeng/react-native-web/stargazers",
    "contributors_url": "https://api.github.com/repos/yinhangfeng/react-native-web/contributors",
    "subscribers_url": "https://api.github.com/repos/yinhangfeng/react-native-web/subscribers",
    "subscription_url": "https://api.github.com/repos/yinhangfeng/react-native-web/subscription",
    "commits_url": "https://api.github.com/repos/yinhangfeng/react-native-web/commits{/sha}",
    "git_commits_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/commits{/sha}",
    "comments_url": "https://api.github.com/repos/yinhangfeng/react-native-web/comments{/number}",
    "issue_comment_url": "https://api.github.com/repos/yinhangfeng/react-native-web/issues/comments{/number}",
    "contents_url": "https://api.github.com/repos/yinhangfeng/react-native-web/contents/{+path}",
    "compare_url": "https://api.github.com/repos/yinhangfeng/react-native-web/compare/{base}...{head}",
    "merges_url": "https://api.github.com/repos/yinhangfeng/react-native-web/merges",
    "archive_url": "https://api.github.com/repos/yinhangfeng/react-native-web/{archive_format}{/ref}",
    "downloads_url": "https://api.github.com/repos/yinhangfeng/react-native-web/downloads",
    "issues_url": "https://api.github.com/repos/yinhangfeng/react-native-web/issues{/number}",
    "pulls_url": "https://api.github.com/repos/yinhangfeng/react-native-web/pulls{/number}",
    "milestones_url": "https://api.github.com/repos/yinhangfeng/react-native-web/milestones{/number}",
    "notifications_url": "https://api.github.com/repos/yinhangfeng/react-native-web/notifications{?since,all,participating}",
    "labels_url": "https://api.github.com/repos/yinhangfeng/react-native-web/labels{/name}",
    "releases_url": "https://api.github.com/repos/yinhangfeng/react-native-web/releases{/id}",
    "deployments_url": "https://api.github.com/repos/yinhangfeng/react-native-web/deployments"
  },
  "score": 84.579025
}
elicwhite commented 4 years ago

I think of these things I'd like to expose

path
repository.name
repository.owner.login

I think exposing these as environment variables is reasonable. Maybe matching https://github.com/NerdWalletOSS/shepherd/pull/88 with:

SHEPHERD_GITHUB_REPO_OWNER
SHEPHERD_GITHUB_REPO_NAME

as well as a

SHEPHERD_MATCH_PATH

What do you think about that?

Also, checkout.ts is the file that runs should_migrate, so naively I'd expect to make this file run should_checkout before checking it out. However, it doesn't have this info. github.ts is the file that gets this data, and it returns an array of repo names, losing this additional info.

It probably doesn't make sense to have github.ts run executeSteps. Instead maybe getCandidateRepos needs to change to return more info? I'm curious what you think.

Here's a concrete proposal:

Make getCandidateRepos take another argument called something like filterCandidate. Then we change this line https://github.com/NerdWalletOSS/shepherd/blob/6fc9881c8ab55aac5fdd4bdc0827fa0c662c008e/src/adapters/github.ts#L76 to something like

repoNames = searchResults.filter(candidate => filterCandidate(candidate)).map((r: any) => r.repository.full_name).sort();

The adapter also implements another method called like

getFilterCandidateEnvironmentVariable(candidate: CandidateResult): Promise<IEnvironmentVariables>

Then checkout.ts would do something like:

repos = await adapter.getCandidateRepos(onRetry, async (candidate) => {
  const envVars = adapter.getFilterCandidateEnvironmentVariable(candidate);
  const stepsResults = await executeSteps(context, repo, 'should_checkout', envVars);
  return stepsResults.succeeded;
});

It looks like there will be some grossness to this implementation like multiple ways exec-in-repo will have to take env vars, however you might need to see it to give concrete feedback. I'll try to turn this into a PR.

elicwhite commented 4 years ago

The problem with the above proposal is that it runs all of these should_checkouts together, and it should be inline together.

I implemented the above proposal and the output is something like this (with three repos where should_checkout is exit 2

> Running should_checkout steps
$ exit 2
> Running should_checkout steps
$ exit 2
> Running should_checkout steps
$ exit 2
Step "exit 2" exited with 2
Step "exit 2" exited with 2
Step "exit 2" exited with 2
✔ Loaded 0 repos
elicwhite commented 4 years ago

I need to get the search results (or at least the data that will become the environment variables) at the time I'm running forEachRepo. I think that means that I need to attach the variables to the repos object returned by getCandidateRepos.

This data would only be present if the org option is not being used. This data is also specific to the github adapter, so it probably shouldn't be parsed or manipulated by anything other than the github adapter. hmm