apache / incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
https://devlake.apache.org/
Apache License 2.0
2.55k stars 508 forks source link

[Bug][Github] GraphQL collector fails on transferred issues #7969

Open peakwinter opened 2 weeks ago

peakwinter commented 2 weeks ago

Search before asking

What happened

The Github GraphQL collector repeatedly fails on issue collection when it encounters an issue that has been transferred to another repository.

task-53475-3-2-github_graphql.log

In the above example, the error is Could not resolve to an Issue with the number of 17617. The issue with that number was transferred to another repository.

What do you expect to happen

The collector ignores issues that have been transferred, deleting them from the dataset or marking them as closed.

How to reproduce

  1. Create a Github issue in repository A
  2. Run the pipeline that includes repository A
  3. Transfer the previously created issue to repository B
  4. Re-run the pipeline that includes repository A

Anything else

No response

Version

v1.0.1-beta7

Are you willing to submit PR?

Code of Conduct

dosubot[bot] commented 2 weeks ago

Hello @peakwinter! I'm here to help you with any bugs, questions, or contribution requests you have. I'll assist you while you wait for a human maintainer.

To address the issue where the GitHub GraphQL collector fails on transferred issues, you can modify the collector to ignore these issues and delete them from the dataset. Here's how you can do it:

  1. Modify the ResponseParser function in the CollectIssues function: Add a check for the StateReason field in the GraphqlQueryIssue struct. If the StateReason is "transferred", skip adding the issue to the dataset.

    ResponseParser: func(queryWrapper any) (messages []json.RawMessage, err errors.Error) {
       query := queryWrapper.(*GraphqlQueryIssueWrapper)
       issues := query.Repository.IssueList.Issues
       for _, rawL := range issues {
           if rawL.StateReason == "transferred" {
               // Skip issues that have been transferred
               continue
           }
           if since != nil && since.After(rawL.UpdatedAt) {
               return messages, api.ErrFinishCollect
           }
           messages = append(messages, errors.Must1(json.Marshal(rawL)))
       }
       return
    },
  2. Ensure the StateReason field is included in the GraphQL query: The GraphqlQueryIssue struct already includes the StateReason field, so no changes are needed there.

    type GraphqlQueryIssue struct {
       DatabaseId   int
       Number       int
       State        string
       StateReason  string
       Title        string
       Body         string
       Author       *GraphqlInlineAccountQuery
       Url          string
       ClosedAt     *time.Time
       CreatedAt    time.Time
       UpdatedAt    time.Time
       AssigneeList struct {
           Assignees []GraphqlInlineAccountQuery `graphql:"nodes"`
       } `graphql:"assignees(first: 100)"`
       Milestone *struct {
           Number int
       } `json:"milestone"`
       Labels struct {
           Nodes []struct {
               Id   string
               Name string
           }
       } `graphql:"labels(first: 100)"`
    }

With these changes, the collector will ignore issues that have been transferred to another repository and will not include them in the dataset [1].

To continue talking to Dosu, mention @dosu.

peakwinter commented 2 weeks ago

Unfortunately there is no StateReason value of "transferred", and since the transferred issues don't show up in the original repo's issues list anymore, it doesn't seem that there is an easy way to figure out what happened to the issue, it just disappears.

klesh commented 2 weeks ago

Thanks for you reporting the issue.