apache / incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
https://devlake.apache.org/
Apache License 2.0
2.58k stars 519 forks source link

[Bug][Github] Extractor failing with draft release #8148

Open karl-d opened 4 days ago

karl-d commented 4 days ago

Search before asking

What happened

When running the pipeline against a github repo that contains a draft release it fails due to not having a incorrect datetime value for the published at column

image

What do you expect to happen

I would expect this to ignore the draft release.

How to reproduce

Create a draft PR in a repository, run the Collect Data and see the error on the github_graphql stage.

Anything else

We have this occur across a number of our projects where we use a GitHub Action to automatically draft a new PR for the next release.

Version

v1.0.1

Are you willing to submit PR?

Code of Conduct

dosubot[bot] commented 4 days ago

Hey @karl-d! I'm here to help you with any bugs, questions, or contributions you have for the project. I'll be your friendly bot while you wait for a human maintainer. Let's tackle this together!

To address the issue with the GitHub extractor failing due to draft releases, you can modify the ExtractReleases function to ignore draft releases and handle invalid dates in the published_at column. Here's how you can update the code:

Extract: func(row *api.RawData) ([]interface{}, errors.Error) {
    release := &GraphqlQueryRelease{}
    err := errors.Convert(json.Unmarshal(row.Data, release))
    if err != nil {
        return nil, err
    }

    // Skip draft releases
    if release.IsDraft {
        return nil, nil
    }

    // Check for invalid 'PublishedAt' date
    if release.PublishedAt == "0000-00-00" {
        release.PublishedAt = nil // or set to a default valid date if necessary
    }

    var results []interface{}
    githubRelease, err := convertGitHubRelease(release, data.Options.ConnectionId, data.Options.GithubId)
    if err != nil {
        return nil, errors.Convert(err)
    }
    results = append(results, githubRelease)

    return results, nil
},

This modification ensures that draft releases are ignored and any invalid 'PublishedAt' dates are handled appropriately [1].

To continue talking to Dosu, mention @dosu.

klesh commented 1 day ago

It should be an easy fix by modifying the following file: https://github.com/apache/incubator-devlake/blob/0a7a71ebcd4ee0bc169687fc0e8058e837c348fd/backend/plugins/github_graphql/tasks/release_extractor.go

https://github.com/apache/incubator-devlake/blob/0a7a71ebcd4ee0bc169687fc0e8058e837c348fd/backend/plugins/github_graphql/tasks/release_collector.go

Would you like to try to do so?