Open ksarabi-sp opened 3 weeks ago
Hello @ksarabi-sp! I'm here to help you with any bugs, questions, or contributions you have for the project. Let's tackle this issue together!
The issue_collector
gets stuck at the "Collect Issues" task for GitHub Enterprise in version v1.0.1-beta9 because the CollectIssues
function does not handle pagination correctly. Specifically, the BuildQuery
function does not properly set the skipCursor
variable, which is necessary for paginating through the issues.
Here is the relevant code snippet:
err = apiCollector.InitGraphQLCollector(api.GraphqlCollectorArgs{
GraphqlClient: data.GraphqlClient,
PageSize: 10,
BuildQuery: func(reqData *api.GraphqlRequestData) (interface{}, map[string]interface{}, error) {
query := &GraphqlQueryIssueWrapper{}
if reqData == nil {
return query, map[string]interface{}{}, nil
}
ownerName := strings.Split(data.Options.Name, "/")
variables := map[string]interface{}{
"pageSize": graphql.Int(reqData.Pager.Size),
"skipCursor": (*graphql.String)(reqData.Pager.SkipCursor),
"owner": graphql.String(ownerName[0]),
"name": graphql.String(ownerName[1]),
}
return query, variables, nil
},
GetPageInfo: func(iQuery interface{}, args *api.GraphqlCollectorArgs) (*api.GraphqlQueryPageInfo, error) {
query := iQuery.(*GraphqlQueryIssueWrapper)
return query.Repository.IssueList.PageInfo, nil
},
ResponseParser: func(queryWrapper any) (messages []json.RawMessage, err errors.Error) {
query := queryWrapper.(*GraphqlQueryIssueWrapper)
issues := query.Repository.IssueList.Issues
for _, rawL := range issues {
if since != nil && since.After(rawL.UpdatedAt) {
return messages, api.ErrFinishCollect
}
messages = append(messages, errors.Must1(json.Marshal(rawL)))
}
return
},
})
The skipCursor
is not being updated correctly, causing the pagination to fail and the task to get stuck [1].
To continue talking to Dosu, mention @dosu.
- Did the GraphQL enabled on the GitHub connection page? Yes, It is enabled by default.
- Were there concurrently pipelines running at them same time? No, only one pipeline was running to collect data from a single repository.
@klesh GraphQL option is not shown on UI
@ksarabi-sp @realhuseyn Are you two working together? 😂
Is it Github Cloud or Enterprise?
@klesh noo :D I am using Github Cloud Enterprise
@realhuseyn Could you try using multiple PATs(from different Github accounts) and see if it helps?
@klesh I will try. Currently, I am using two different PAT. I will add a new one
@klesh I tried. I have added 3 PATs and my pipeline works normally. But I want to use Github App, because it has more rate limits than PAT.
- Did the GraphQL enabled on the GitHub connection page?
- Were there concurrently pipelines running at them same time?
@klesh I am using Github enterprise 3.14 and do not have any rate limit, but still have this issue when using it for our GHE.
@klesh I tried. I have added 3 PATs and my pipeline works normally. But I want to use Github App, because it has more rate limits than PAT.
@realhuseyn I completely agree! It would be fantastic if someone could address and resolve this issue.
- Did the GraphQL enabled on the GitHub connection page?
- Were there concurrently pipelines running at them same time?
@klesh I am using Github enterprise 3.14 and do not have any rate limit, but still have this issue when using it for our GHE.
@ksarabi-sp That seems unusual. Your logs indicate that your GHE was rejecting API requests due to rate limiting. Perhaps you could write a simple script to make concurrent API requests and check if the same error occurs. You can determine the request rate by searching for “interval” in the log.
- Did the GraphQL enabled on the GitHub connection page?
- Were there concurrently pipelines running at them same time?
@klesh I am using Github enterprise 3.14 and do not have any rate limit, but still have this issue when using it for our GHE.
@ksarabi-sp That seems unusual. Your logs indicate that your GHE was rejecting API requests due to rate limiting. Perhaps you could write a simple script to make concurrent API requests and check if the same error occurs. You can determine the request rate by searching for “interval” in the log.
@klesh are you sure it is getting API limit from GHE? since we do not have API limit in our GHE server, where in GHE we can see if there is any API limit? is it possible that calling API in Github.com?
Search before asking
What happened
2024-09-09T16:58:31.112954005Z time="2024-09-09 16:58:31" level=info msg=" [pipeline service] [pipeline #1] [task #2] executing subtask Collect Issues" 2024-09-09T16:58:31.119890379Z time="2024-09-09 16:58:31" level=info msg=" [pipeline service] [pipeline #1] [task #2] [Collect Issues] start graphql collection" 2024-09-09T16:58:31.146203954Z time="2024-09-09 16:58:31" level=info msg=" [pip eline service] [pipeline #1] [task #2] rate limit remaining exhausted, waiting for next period."
2024-09-09T17:01:30.770231161Z time="2024-09-09 17:01:30" level=info msg=" [pipeline service] [pipeline #1] [task #2] github graphql init success with remaining 0/0 and will reset at 0001-01-01 00:00:00 +0000 UTC" 2024-09-09T17:04:30.832605306Z time="2024-09-09 17:04:30" level=info msg=" [pipeline service] [pipeline #1] [task #2] github graphql init success with remaining 0/0 and will reset at 0001-01-01 00:00:00 +0000 UTC" 2024-09-09T17:07:30.891184522Z time="2024-09-09 17:07:30" level=info msg=" [pipeline service] [pipeline #1] [task #2] github graphql init success with remaining 0/0 and will reset at 0001-01-01 00:00:00 +0000 UTC"
What do you expect to happen
not to get stuck with collecting issues task
How to reproduce
1- create a connector by connecting to github enterprise 2- select one repo 3- create project and add above github connector to it 4- start collect data
Anything else
No response
Version
v1.0.1-beta9
Are you willing to submit PR?
Code of Conduct