Open elicwhite opened 4 years ago
You might be able to accomplish what you want with -path:node_modules
(see docs on exclusion), but GitHub search is not exactly the most reliable or robust thing, so I think the type of functionality you’re talking about makes sense. At NerdWallet, we’ve solved this using the should_migrate
hook, but we’re only ever working with ~100 repos so the cost of cloning is ok. Being able to do a pre-clone filter makes sense for your type of use case.
Do you have a proposal in mind for how this would work?
I’m not sure how it would work. The simple case of filtering the file path is relatively straight forward. You could even make that an env variable that could be passed to a script or just grep directly.
However, I imagine there is some other data returned from the github api that might be nice to filter on, although I’m not sure what that is. In that case I’d love to process it with a script. Maybe you could json stringify it and call a script with that as stdin, or just stick it in an env var too :-D
I like the serialized json over stdin idea. I think that might be the most robust, but maybe not the most ergonomic to script against. If there’s a limited number of basic data (strings), env vars might make more sense.
I think the next step would be to see what context is available at that time that would be relevant to filtering (gh api response, anything else?) and go from there.
That sounds like a great plan to me! Do you happen to have any bandwidth or interest to work on this? Otherwise I’ll take a stab when I need it.
Edit: it was silly for me to ask that. I’ll dig into this myself when I work on this next
This is an example element of the array of the JSON response for a search request:
{
"name": "ProgressViewIOS.ios.js",
"path": "Libraries/Components/ProgressViewIOS/ProgressViewIOS.ios.js",
"sha": "76a1f5efb5294638826265b6a4009e6419da1fd4",
"url": "https://api.github.com/repositories/214770447/contents/Libraries/Components/ProgressViewIOS/ProgressViewIOS.ios.js?ref=d5e5d39c2dd9822af07448ff47e44ed1df44d8e7",
"git_url": "https://api.github.com/repositories/214770447/git/blobs/76a1f5efb5294638826265b6a4009e6419da1fd4",
"html_url": "https://github.com/yinhangfeng/react-native-web/blob/d5e5d39c2dd9822af07448ff47e44ed1df44d8e7/Libraries/Components/ProgressViewIOS/ProgressViewIOS.ios.js",
"repository": {
"id": 214770447,
"node_id": "MDEwOlJlcG9zaXRvcnkyMTQ3NzA0NDc=",
"name": "react-native-web",
"full_name": "yinhangfeng/react-native-web",
"private": false,
"owner": {
"login": "yinhangfeng",
"id": 2867940,
"node_id": "MDQ6VXNlcjI4Njc5NDA=",
"avatar_url": "https://avatars0.githubusercontent.com/u/2867940?v=4",
"gravatar_id": "",
"url": "https://api.github.com/users/yinhangfeng",
"html_url": "https://github.com/yinhangfeng",
"followers_url": "https://api.github.com/users/yinhangfeng/followers",
"following_url": "https://api.github.com/users/yinhangfeng/following{/other_user}",
"gists_url": "https://api.github.com/users/yinhangfeng/gists{/gist_id}",
"starred_url": "https://api.github.com/users/yinhangfeng/starred{/owner}{/repo}",
"subscriptions_url": "https://api.github.com/users/yinhangfeng/subscriptions",
"organizations_url": "https://api.github.com/users/yinhangfeng/orgs",
"repos_url": "https://api.github.com/users/yinhangfeng/repos",
"events_url": "https://api.github.com/users/yinhangfeng/events{/privacy}",
"received_events_url": "https://api.github.com/users/yinhangfeng/received_events",
"type": "User",
"site_admin": false
},
"html_url": "https://github.com/yinhangfeng/react-native-web",
"description": null,
"fork": false,
"url": "https://api.github.com/repos/yinhangfeng/react-native-web",
"forks_url": "https://api.github.com/repos/yinhangfeng/react-native-web/forks",
"keys_url": "https://api.github.com/repos/yinhangfeng/react-native-web/keys{/key_id}",
"collaborators_url": "https://api.github.com/repos/yinhangfeng/react-native-web/collaborators{/collaborator}",
"teams_url": "https://api.github.com/repos/yinhangfeng/react-native-web/teams",
"hooks_url": "https://api.github.com/repos/yinhangfeng/react-native-web/hooks",
"issue_events_url": "https://api.github.com/repos/yinhangfeng/react-native-web/issues/events{/number}",
"events_url": "https://api.github.com/repos/yinhangfeng/react-native-web/events",
"assignees_url": "https://api.github.com/repos/yinhangfeng/react-native-web/assignees{/user}",
"branches_url": "https://api.github.com/repos/yinhangfeng/react-native-web/branches{/branch}",
"tags_url": "https://api.github.com/repos/yinhangfeng/react-native-web/tags",
"blobs_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/blobs{/sha}",
"git_tags_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/tags{/sha}",
"git_refs_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/refs{/sha}",
"trees_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/trees{/sha}",
"statuses_url": "https://api.github.com/repos/yinhangfeng/react-native-web/statuses/{sha}",
"languages_url": "https://api.github.com/repos/yinhangfeng/react-native-web/languages",
"stargazers_url": "https://api.github.com/repos/yinhangfeng/react-native-web/stargazers",
"contributors_url": "https://api.github.com/repos/yinhangfeng/react-native-web/contributors",
"subscribers_url": "https://api.github.com/repos/yinhangfeng/react-native-web/subscribers",
"subscription_url": "https://api.github.com/repos/yinhangfeng/react-native-web/subscription",
"commits_url": "https://api.github.com/repos/yinhangfeng/react-native-web/commits{/sha}",
"git_commits_url": "https://api.github.com/repos/yinhangfeng/react-native-web/git/commits{/sha}",
"comments_url": "https://api.github.com/repos/yinhangfeng/react-native-web/comments{/number}",
"issue_comment_url": "https://api.github.com/repos/yinhangfeng/react-native-web/issues/comments{/number}",
"contents_url": "https://api.github.com/repos/yinhangfeng/react-native-web/contents/{+path}",
"compare_url": "https://api.github.com/repos/yinhangfeng/react-native-web/compare/{base}...{head}",
"merges_url": "https://api.github.com/repos/yinhangfeng/react-native-web/merges",
"archive_url": "https://api.github.com/repos/yinhangfeng/react-native-web/{archive_format}{/ref}",
"downloads_url": "https://api.github.com/repos/yinhangfeng/react-native-web/downloads",
"issues_url": "https://api.github.com/repos/yinhangfeng/react-native-web/issues{/number}",
"pulls_url": "https://api.github.com/repos/yinhangfeng/react-native-web/pulls{/number}",
"milestones_url": "https://api.github.com/repos/yinhangfeng/react-native-web/milestones{/number}",
"notifications_url": "https://api.github.com/repos/yinhangfeng/react-native-web/notifications{?since,all,participating}",
"labels_url": "https://api.github.com/repos/yinhangfeng/react-native-web/labels{/name}",
"releases_url": "https://api.github.com/repos/yinhangfeng/react-native-web/releases{/id}",
"deployments_url": "https://api.github.com/repos/yinhangfeng/react-native-web/deployments"
},
"score": 84.579025
}
I think of these things I'd like to expose
path
repository.name
repository.owner.login
I think exposing these as environment variables is reasonable. Maybe matching https://github.com/NerdWalletOSS/shepherd/pull/88 with:
SHEPHERD_GITHUB_REPO_OWNER
SHEPHERD_GITHUB_REPO_NAME
as well as a
SHEPHERD_MATCH_PATH
What do you think about that?
Also, checkout.ts
is the file that runs should_migrate
, so naively I'd expect to make this file run should_checkout
before checking it out. However, it doesn't have this info. github.ts
is the file that gets this data, and it returns an array of repo names, losing this additional info.
It probably doesn't make sense to have github.ts
run executeSteps
. Instead maybe getCandidateRepos
needs to change to return more info? I'm curious what you think.
Here's a concrete proposal:
Make getCandidateRepos
take another argument called something like filterCandidate
.
Then we change this line
https://github.com/NerdWalletOSS/shepherd/blob/6fc9881c8ab55aac5fdd4bdc0827fa0c662c008e/src/adapters/github.ts#L76
to something like
repoNames = searchResults.filter(candidate => filterCandidate(candidate)).map((r: any) => r.repository.full_name).sort();
The adapter also implements another method called like
getFilterCandidateEnvironmentVariable(candidate: CandidateResult): Promise<IEnvironmentVariables>
Then checkout.ts
would do something like:
repos = await adapter.getCandidateRepos(onRetry, async (candidate) => {
const envVars = adapter.getFilterCandidateEnvironmentVariable(candidate);
const stepsResults = await executeSteps(context, repo, 'should_checkout', envVars);
return stepsResults.succeeded;
});
It looks like there will be some grossness to this implementation like multiple ways exec-in-repo
will have to take env vars, however you might need to see it to give concrete feedback. I'll try to turn this into a PR.
The problem with the above proposal is that it runs all of these should_checkout
s together, and it should be inline together.
I implemented the above proposal and the output is something like this (with three repos where should_checkout
is exit 2
> Running should_checkout steps
$ exit 2
> Running should_checkout steps
$ exit 2
> Running should_checkout steps
$ exit 2
Step "exit 2" exited with 2
Step "exit 2" exited with 2
Step "exit 2" exited with 2
✔ Loaded 0 repos
I need to get the search results (or at least the data that will become the environment variables) at the time I'm running forEachRepo
. I think that means that I need to attach the variables to the repos
object returned by getCandidateRepos
.
This data would only be present if the org
option is not being used. This data is also specific to the github adapter, so it probably shouldn't be parsed or manipulated by anything other than the github adapter. hmm
I'd like to be able to run some lightweight filters that can't be encoded in the GitHub search API before checking out the repo to do heavier weight repos.
For example, my search results are matching repos that has node_modules checked in and the query is matching files inside of node_modules.
For example, a url like this:
riphunter07/ATproject/ATproject/node_modules.old/react-native/jest/setup.js
.I could encode
NOT node_modules
in my query, but I think that would exclude modules with:Github lets you do searches with file names but I don't think that lets you do negative file name searches:
Being able to do this early filtering (on paths, or even other things that GitHub returns as part of the API search results) would let me avoid having to clone hundreds of repos that are using React Native to realize my search is finding code in react-native core in each of these projects.