apache / incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
https://devlake.apache.org/
Apache License 2.0
2.5k stars 493 forks source link

[Bug][GitHub] Labels are not updated when removed from GitHub issues #7510

Open michelengelen opened 1 month ago

michelengelen commented 1 month ago

Search before asking

What happened

Issues that receive a label will not be updated when the label gets removed.

Example: We do track user interaction with labels on issues in github. When a user opens a new issue it gets automatically labeled as "waiting for maintainer". This is to indicate to us that this needs attention. Now when we ...

The problem we face is that the issue will still be associated to that label even if it got removed from the issue. This is messing with our metrics atm.

This is one of the queries we use to collect data regarding labels:

with issue_list as(
  select
    i.id as issue_id,
    i.url,
    i.issue_key,
    i.title,
    i.creator_name,
    i.created_date as issue_created_date,
    i.status
  from
    issues i
    join board_issues bi on i.id = bi.issue_id
    join boards b on bi.board_id = b.id
    join issue_labels il on il.issue_id = i.id
  where
    $__timeFilter(i.created_date)
    and il.label_name = 'status: waiting for maintainer'
    and i.status != 'DONE'
    and b.id in ('github:GithubRepo:1:260240241')
)

select 
  issue_key,
  title,
  creator_name,
  issue_created_date,
  status,
  (TIMESTAMPDIFF(MINUTE,issue_created_date,NOW()))/1440 as 'queue_time_in_days',
  url
from issue_list
ORDER BY queue_time_in_days ASC 

You can check this by looking at the result with id 13171. In the screenshot below this is incorrectly fetched.

Screenshot 2024-05-24 at 11 08 43

What do you expect to happen

Issues that had the label removed should not be in the results when querying for issues by label.

How to reproduce

just query an issue (or a list of issues) with a query similar to the above.

Anything else

we just recently noticed, but this is a constant problem

Version

v1.0.0-beta7

Are you willing to submit PR?

Code of Conduct

dosubot[bot] commented 1 month ago

@michelengelen Hello there! I'm Dosu, a bot here to help you with bug fixes, answer your questions, and guide you on becoming a contributor. I'm here to assist you while you wait for a human maintainer.

To ensure that issues with removed labels are not included in the results when querying for issues by label in GitHub, you need to filter the issues based on their current labels. The provided code does not explicitly show how to filter out issues with removed labels. However, you can modify the ExtractApiIssues function to include this logic.

Here is an example of how you can modify the ExtractApiIssues function to filter out issues that do not have the desired labels:

func ExtractApiIssues(taskCtx plugin.SubTaskContext) errors.Error {
    data := taskCtx.GetData().(*GithubTaskData)
    config := data.Options.ScopeConfig
    issueRegexes, err := NewIssueRegexes(config)
    if err != nil {
        return nil
    }
    extractor, err := api.NewApiExtractor(api.ApiExtractorArgs{
        RawDataSubTaskArgs: api.RawDataSubTaskArgs{
            Ctx: taskCtx,
            Params: GithubApiParams{
                ConnectionId: data.Options.ConnectionId,
                Name:         data.Options.Name,
            },
            Table: RAW_ISSUE_TABLE,
        },
        Extract: func(row *api.RawData) ([]interface{}, errors.Error) {
            body := &IssuesResponse{}
            err := errors.Convert(json.Unmarshal(row.Data, body))
            if err != nil {
                return nil, err
            }
            if body.GithubId == 0 || body.PullRequest.Url != "" {
                return nil, nil
            }
            results := make([]interface{}, 0, 2)
            desiredLabel := "status: waiting for maintainer" // Replace with the label you are interested in
            hasDesiredLabel := false
            for _, label := range body.Labels {
                if label.Name == desiredLabel {
                    hasDesiredLabel = true
                    break
                }
            }
            if hasDesiredLabel {
                githubIssue, err := convertGithubIssue(body, data.Options.ConnectionId, data.Options.GithubId)
                if err != nil {
                    return nil, err
                }
                githubLabels, err := convertGithubLabels(issueRegexes, body, githubIssue)
                if err != nil {
                    return nil, err
                }
                results = append(results, githubLabels...)
                results = append(results, githubIssue)
                if body.Assignee != nil {
                    githubIssue.AssigneeId = body.Assignee.Id
                    githubIssue.AssigneeName = body.Assignee.Login
                    relatedUser, err := convertAccount(body.Assignee, data.Options.GithubId, data.Options.ConnectionId)
                    if err != nil {
                        return nil, err
                    }
                    results = append(results, relatedUser)
                }
                for _, assignee := range body.Assignees {
                    issueAssignee := &models.GithubIssueAssignee{
                        ConnectionId: githubIssue.ConnectionId,
                        IssueId:      githubIssue.GithubId,
                        RepoId:       githubIssue.RepoId,
                        AssigneeId:   assignee.Id,
                        AssigneeName: assignee.Login,
                    }
                    results = append(results, issueAssignee)
                }
                if body.User != nil {
                    githubIssue.AuthorId = body.User.Id
                    githubIssue.AuthorName = body.User.Login
                    relatedUser, err := convertAccount(body.User, data.Options.GithubId, data.Options.ConnectionId)
                    if err != nil {
                        return nil, err
                    }
                    results = append(results, relatedUser)
                }
            }
            return results, nil
        },
    })
    if err != nil {
        return err
    }
    return extractor.Execute()
}

In this modified ExtractApiIssues function, the code iterates through each issue and checks if it has the desired label. If it does, the issue is processed and added to the results. This ensures that only issues with the specified label are included in the results [1][2][3].

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Startrekzky commented 1 month ago

Thanks for reporting. @abeizn Could you take a look, please? It seems like issue labels are not updated accordingly after they have been removed from the source tool.

klesh commented 1 month ago

The problem stems from the issue_collector and its Incremental Collection feature. Currently, this feature relies on the updated_at timestamp of issues, which unfortunately isn't updated when only labels are changed. This means the API won't treat issues with removed labels as "updated" and won't include them in the response.

We understand this limitation is inconvenient. While a perfect solution isn't immediately available, here are a couple of options:

  1. Request a change from GitHub: The ideal fix would be for GitHub's API to update the updated_at timestamp even when only labels are modified. We can submit a feature request to GitHub highlighting this discrepancy.

  2. Run Full Sync periodically: As a workaround, you can manually trigger a Full Sync of the issue_collector whenever necessary. This will ensure all issues, including those with label changes, are fetched.

We'll explore potential solutions on our end and keep you updated on any progress. In the meantime, please let us know if you have any other questions or concerns.

michelengelen commented 1 month ago

Is there a way to tell the collector to do a full-collection on each job?

klesh commented 1 month ago

@michelengelen You may check the code: If a subtask uses a stateful helper, it supports Incremental Mode and it would work that way unless you force the Full Sync model: image

image

michelengelen commented 1 month ago

Ok, thanks @klesh

I just tested the Github GraphQL for issues and it does indeed update the updated_at value when you add and remove a label (sry for the long code examples here):

Issue in its "original state":

{
  "data": {
    "repository": {
      "issue": {
        "id": "I_kwDOD4Lzcc6KPqSy",
        "updatedAt": "2024-05-28T07:51:48Z",
        "labels": {
          "totalCount": 2,
          "nodes": [
            {
              "color": "ffffff",
              "name": "component: data grid",
              "description": "This is the name of the generic UI component, not the React module!",
              "id": "MDU6TGFiZWwyMjM0NzExNDY0"
            },
            {
              "color": "c2e0c6",
              "name": "status: waiting for maintainer",
              "description": "These issues haven't been looked at yet by a maintainer",
              "id": "MDU6TGFiZWwyMjk5ODEyMTc3"
            }
          ]
        }
      }
    }
  }
}

Issue after labels got added:

{
  "data": {
    "repository": {
      "issue": {
        "id": "I_kwDOD4Lzcc6KPqSy",
        "updatedAt": "2024-05-28T08:25:08Z",
        "labels": {
          "totalCount": 4,
          "nodes": [
            {
              "color": "e11d21",
              "name": "bug 🐛",
              "description": "Something doesn't work",
              "id": "MDU6TGFiZWwyMDI3MTk2OTU5"
            },
            {
              "color": "ffffff",
              "name": "component: data grid",
              "description": "This is the name of the generic UI component, not the React module!",
              "id": "MDU6TGFiZWwyMjM0NzExNDY0"
            },
            {
              "color": "c2e0c6",
              "name": "status: waiting for maintainer",
              "description": "These issues haven't been looked at yet by a maintainer",
              "id": "MDU6TGFiZWwyMjk5ODEyMTc3"
            },
            {
              "color": "0465d6",
              "name": "customization: css",
              "description": "Design CSS customizability",
              "id": "LA_kwDOD4Lzcc7zJLz0"
            }
          ]
        }
      }
    }
  }
}

And here it is after removing the "bug" label again:

{
  "data": {
    "repository": {
      "issue": {
        "id": "I_kwDOD4Lzcc6KPqSy",
        "updatedAt": "2024-05-28T08:25:30Z",
        "labels": {
          "totalCount": 3,
          "nodes": [
            {
              "color": "ffffff",
              "name": "component: data grid",
              "description": "This is the name of the generic UI component, not the React module!",
              "id": "MDU6TGFiZWwyMjM0NzExNDY0"
            },
            {
              "color": "c2e0c6",
              "name": "status: waiting for maintainer",
              "description": "These issues haven't been looked at yet by a maintainer",
              "id": "MDU6TGFiZWwyMjk5ODEyMTc3"
            },
            {
              "color": "0465d6",
              "name": "customization: css",
              "description": "Design CSS customizability",
              "id": "LA_kwDOD4Lzcc7zJLz0"
            }
          ]
        }
      }
    }
  }
}

Or am I looking at the wrong data?

klesh commented 1 month ago

@michelengelen Nice work. If the issue.updated_at is in fact updated, it should be collected in Incremental Mode. I am not familiar with the graphql , can you help check if the issue gets included by the server? I think this line is what filtering the data since the previous collection

michelengelen commented 1 month ago

@klesh following the example in the code I constructed this query:

{
  repository(owner: "mui", name: "mui-x") {
    issues(first: 5, orderBy: {field: UPDATED_AT, direction: DESC}) {
      nodes {
        ... on Issue {
          id
          title
          updatedAt
          labels(first: 20) {
            totalCount
            nodes {
              color
              name
              description
              id
            }
          }
        }
      }
    }
  }
}

I queried this and then changed labels on the first one to check if the updated_at field gets a new value and it indeed does so.

{
  "data": {
    "repository": {
      "issues": {
        "nodes": [
          {
            "id": "I_kwDOD4Lzcc6KlkU2",
            "title": "DataGridPro checkboxSelection Issue",
            "updatedAt": "2024-05-30T10:01:01Z",
            "labels": {
            "updatedAt": "2024-05-30T10:07:05Z",
            "labels": {
-             "totalCount": 1,
+             "totalCount": 2,
              "nodes": [
+               {
+                 "color": "ffffff",
+                 "name": "component: data grid",
+                 "description": "This is the name of the generic UI component, not the React module!",
+                  "id": "MDU6TGFiZWwyMjM0NzExNDY0"
+               },
                {
                  "color": "c2e0c6",
                  "name": "status: waiting for maintainer",
                  "description": "These issues haven't been looked at yet by a maintainer",
                  "id": "MDU6TGFiZWwyMjk5ODEyMTc3"
                }
              ]
            }
          },
          {
            "id": "I_kwDOD4Lzcc6KlAZv",
            "title": "[data grid] Upgrading `@mui/x-data-grid-premium` to `v7` produces emotion error",
            "updatedAt": "2024-05-30T09:59:40Z",
            "labels": {
              "totalCount": 4,
              "nodes": [
                {
                  "color": "ffffff",
                  "name": "component: data grid",
                  "description": "This is the name of the generic UI component, not the React module!",
                  "id": "MDU6TGFiZWwyMjM0NzExNDY0"
                },
                {
                  "color": "c2e0c6",
                  "name": "status: waiting for maintainer",
                  "description": "These issues haven't been looked at yet by a maintainer",
                  "id": "MDU6TGFiZWwyMjk5ODEyMTc3"
                },
                {
                  "color": "5319E7",
                  "name": "support: commercial",
                  "description": "Support request from paid users",
                  "id": "LA_kwDOD4Lzcc7hP007"
                },
                {
                  "color": "0052CC",
                  "name": "support: premium standard",
                  "description": "Support request from a Premium standard plan user. https://mui.com/legal/technical-support-sla/",
                  "id": "LA_kwDOD4Lzcc8AAAABc-oCvw"
                }
              ]
            }
          },
          {
            "id": "I_kwDOD4Lzcc6Kj7Ks",
            "title": "[charts] `ChartsTooltip` often not working properly when assigning `innerRadius` in series and using `ResponsiveChartContainer`",
            "updatedAt": "2024-05-30T09:56:33Z",
            "labels": {
              "totalCount": 2,
              "nodes": [
                {
                  "color": "cccccc",
                  "name": "status: waiting for author",
                  "description": "Issue with insufficient information",
                  "id": "MDU6TGFiZWwyNDI1NDQwNjc2"
                },
                {
                  "color": "ffffff",
                  "name": "component: charts",
                  "description": "This is the name of the generic UI component, not the React module!",
                  "id": "LA_kwDOD4Lzcc8AAAABMYODPg"
                }
              ]
            }
          },
          {
            "id": "I_kwDOD4Lzcc6KjxKd",
            "title": "[data grid] How do I fix this? - `GridVirtualScrollbar.js:88 Uncaught TypeError: Cannot set properties of null (setting 'scrollTop')`",
            "updatedAt": "2024-05-30T09:55:23Z",
            "labels": {
              "totalCount": 5,
              "nodes": [
                {
                  "color": "cccccc",
                  "name": "status: waiting for author",
                  "description": "Issue with insufficient information",
                  "id": "MDU6TGFiZWwyNDI1NDQwNjc2"
                },
                {
                  "color": "ECB272",
                  "name": "feature: Filtering",
                  "description": "Related to the data grid Filtering feature",
                  "id": "LA_kwDOD4Lzcc7gM5sl"
                },
                {
                  "color": "5319E7",
                  "name": "support: commercial",
                  "description": "Support request from paid users",
                  "id": "LA_kwDOD4Lzcc7hP007"
                },
                {
                  "color": "0052CC",
                  "name": "support: pro standard",
                  "description": "Support request from a Pro standard plan user. https://mui.com/legal/technical-support-sla/",
                  "id": "LA_kwDOD4Lzcc8AAAABc-pqFw"
                },
                {
                  "color": "55bdd4",
                  "name": "v7.x migration",
                  "description": "",
                  "id": "LA_kwDOD4Lzcc8AAAABkjyJIA"
                }
              ]
            }
          },
          {
            "id": "I_kwDOD4Lzcc6KgYRp",
            "title": "[data grid] sorting is not working for DD/MM/YYYY and MM/DD/YYYY date format it is only working for YYYY/MM/DD.",
            "updatedAt": "2024-05-30T09:52:01Z",
            "labels": {
              "totalCount": 3,
              "nodes": [
                {
                  "color": "ffffff",
                  "name": "component: data grid",
                  "description": "This is the name of the generic UI component, not the React module!",
                  "id": "MDU6TGFiZWwyMjM0NzExNDY0"
                },
                {
                  "color": "c2e0c6",
                  "name": "status: waiting for maintainer",
                  "description": "These issues haven't been looked at yet by a maintainer",
                  "id": "MDU6TGFiZWwyMjk5ODEyMTc3"
                },
                {
                  "color": "ECB272",
                  "name": "feature: Sorting",
                  "description": "Related to the data grid Sorting feature",
                  "id": "LA_kwDOD4Lzcc7ga9u_"
                }
              ]
            }
          }
        ]
      }
    }
  }
}

Does that help?

klesh commented 1 month ago

Yes, it does, thanks for the information. Will look into it later.

michelengelen commented 3 weeks ago

any updates on this so far?

klesh commented 2 weeks ago

Not yet, this had been scheduled to be fixed in v1.1