apache / incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
https://devlake.apache.org/
Apache License 2.0
2.61k stars 530 forks source link

[Bug][Github] Github deployments without Github Actions #7435

Closed carocad closed 6 months ago

carocad commented 7 months ago

Search before asking

What happened

The devlake documentation says that Github deployment are read in order to compute DORA metrics. However, in several of my (private) projects, I have Github deployments that are created without using Github Actions. In those cases, devlake doesn't detect the Deployments as (it seems to) assume that Deployments can only come from Github Actions as seen here.

image

What do you expect to happen

devlake should take the Deployment information from the Github Deployment API not from the Github Actions API; see docs. As per the previous docs it is possible to create Github deployments without using Github actions.

How to reproduce

Anything else

See steps above. It happens every time even with the latest version

Version

v1.0.0-beta6

Are you willing to submit PR?

Code of Conduct

klesh commented 6 months ago

I think we do collect Github Deployment as well https://github.com/apache/incubator-devlake/blob/470669d603f792927d1313194a408597a1c145fd/backend/plugins/github_graphql/tasks/deployment_collector.go#L42

Select the CICD entities in the configuration and it should be good to go, just ignore the Github Action if you don't need them.

carocad commented 6 months ago

Maybe I misunderstood the code (I haven't taken a deep look) but it looks like that is only used by the graphql implementation but in our case we are using the "plain api".

just ignore the Github Action if you don't need them.

We tried this, we have 2 projects :

In the first case we don't get any deployment information shown in devlake/grafana dashboards but on the second case we do get the information which is what pointed me to the distinction mentioned on the bug description.

Please let me know if I can provide you some debug logs or similar to further clarify this :) PS: we are using Github Enterprise edition not github.com

klesh commented 6 months ago

That is weird, the subtasks for collecting and converting Github Deployment are enabled by default. Can you share the configuration which you can find by following this screenshot image

carocad commented 6 months ago

That is weird, the subtasks for collecting and converting Github Deployment are enabled by default. Can you share the configuration which you can find by following this screenshot

Sure thing, here you go.

ghe+actions.json ghe+jenkins.json

The first file is the one where deployments are made with github actions; whereas the second one is where the deployments are made outside of it. On the json files I can see that "Collect Workflow Runs" is a subtasks but there is no "Collect Deployments" or similar.

klesh commented 6 months ago

I see, did you disable the github graphql feature on the connection detail?

klesh commented 6 months ago

Try enable it and see if the problem got fixed.

carocad commented 6 months ago

Try enable it and see if the problem got fixed.

just tried it but unfortunately it crashes. See logs below

time="2024-05-14 08:46:28" level=info msg=" [pipeline service] [pipeline #87] [task #618] start executing task: 618"
time="2024-05-14 08:46:28" level=info msg=" [pipeline service] [pipeline #87] [task #618] start plugin"
time="2024-05-14 08:46:28" level=info msg=" [pipeline service] [pipeline #87] [task #618] [api async client] creating scheduler for api \"https://github.boschdevcloud.com/api/v3/\", number of workers: 13, 9500 reqs / 1h0m0s (interval: 378.947368ms)"
time="2024-05-14 08:46:28" level=error msg=" [pipeline service] [pipeline #87] [task #618] run task failed with panic\n\tcaused by: run task failed with panic (github.com/apache/incubator-devlake/helpers/pluginhelper/api.CreateAsyncGraphqlClient:71)\n\tWraps: (2) non-200 OK status code: 404 Not Found body: \"{\\\"message\\\":\\\"Not Found\\\",\\\"documentation_url\\\":\\\"https://docs.github.com/enterprise-server@3.11/rest\\\"}\"\n\tWraps: (3) non-200 OK status code: 404 Not Found body: \"{\\\"message\\\":\\\"Not Found\\\",\\\"documentation_url\\\":\\\"https://docs.github.com/enterprise-server@3.11/rest\\\"}\"\n\tError types: (1) *hintdetail.withDetail (2) *hintdetail.withDetail (3) *errors.errorString"
klesh commented 6 months ago

Seems like you are using the enterprise version, is it Cloud or On-premise?

carocad commented 6 months ago

Seems like you are using the enterprise version, is it Cloud or On-premise?

GitHub Enterprise Server 3.11.9 (OnPremise)

klesh commented 6 months ago

It sounds like there might be some specific behaviors with GitHub Enterprise Server 3.11.9 (On-Premise). Unfortunately, since we don't currently have access to this version for testing, it limits our ability to directly replicate the issue.

Here are a few ways we can move forward:

Community Resources: Have you checked the GitHub Enterprise Server documentation or community forums for known quirks or workarounds related to your specific version? There might be existing solutions or insights from other users. Consider Upgrading (Optional): If feasible, upgrading to a newer version of GitHub Enterprise Server might resolve the issue and provide access to the latest features and bug fixes.

carocad commented 6 months ago

Hello @klesh , It seems that the rate limit behavior is deactivated by default on Github Enterprise Server. See docs: https://docs.github.com/en/enterprise-server@3.11/graphql/overview/rate-limits-and-node-limits-for-the-graphql-api#primary-rate-limit

Rate limits are disabled by default for GitHub Enterprise Server.

Would it be possible for devlake to handle that case with a default value or another method which avoid a panic? I can also contact the administrators on our side and ask if they can set one but since this is the default behavior from Github Server I would assume that this wouldn't be an isolated case.

klesh commented 6 months ago

@carocad Sure, you may find the "Custom Rate Limit" on the connection page.

image

carocad commented 6 months ago

@carocad Sure, you may find the "Custom Rate Limit" on the connection page.

Unfortunately that didn't work either. See logs below

time="2024-05-21 07:48:30" level=info msg=" [pipeline service] [pipeline #145] [task #971] start executing task: 971"
time="2024-05-21 07:48:30" level=info msg=" [pipeline service] [pipeline #145] [task #971] start plugin"
time="2024-05-21 07:48:30" level=info msg=" [pipeline service] [pipeline #145] [task #971] [api async client] creating scheduler for api \"https://github.boschdevcloud.com/api/v3/\", number of workers: 6, 4500 reqs / 1h0m0s (interval: 800ms)"
time="2024-05-21 07:48:31" level=error msg=" [pipeline service] [pipeline #145] [task #971] run task failed with panic\n\tcaused by: run task failed with panic (github.com/apache/incubator-devlake/helpers/pluginhelper/api.CreateAsyncGraphqlClient:71)\n\tWraps: (2) non-200 OK status code: 404 Not Found body: \"{\\\"message\\\":\\\"Not Found\\\",\\\"documentation_url\\\":\\\"https://docs.github.com/enterprise-server@3.11/rest\\\"}\"\n\tWraps: (3) non-200 OK status code: 404 Not Found body: \"{\\\"message\\\":\\\"Not Found\\\",\\\"documentation_url\\\":\\\"https://docs.github.com/enterprise-server@3.11/rest\\\"}\"\n\tError types: (1) *hintdetail.withDetail (2) *hintdetail.withDetail (3) *errors.errorString"

I can only assume from this that DevLake would check the rate limit endpoint regardless of the custom rate limit and only after it receives a response it would then overwrite it. If you agree that this is a bug, I would happily try to create a PR to fix it :)

carocad commented 6 months ago

... I can also contact the administrators on our side and ask if they can set one but since this is the default behavior from Github Server I would assume that this wouldn't be an isolated case.

I just got an answer from the administrators on my site. The GraphQL endpoints on our side already have a rate limit so this shouldn't be the case. I also tried querying it myself and got an response without any issues. I submitted from the github.com docs (see ref)

query {
  viewer {
    login
  }
  rateLimit {
    limit
    remaining
    used
    resetAt
  }
}

response

{
    "data": {
        "viewer": {
            "login": "<redacted>"
        },
        "rateLimit": {
            "limit": 5000,
            "remaining": 4999,
            "used": 1,
            "resetAt": "2024-05-21T10:17:01Z"
        }
    }
}

I don't know which query does dev-lake uses but I can only guess that one of the fetched parameters was introduced on a version of github newer than 3.11.9. In any case, my proposal above still stands. If you point me in the right direction I can definitely give it a try :)

d4x1 commented 6 months ago

... I can also contact the administrators on our side and ask if they can set one but since this is the default behavior from Github Server I would assume that this wouldn't be an isolated case.

I just got an answer from the administrators on my site. The GraphQL endpoints on our side already have a rate limit so this shouldn't be the case. I also tried querying it myself and got an response without any issues. I submitted from the github.com docs (see ref)

query {
  viewer {
    login
  }
  rateLimit {
    limit
    remaining
    used
    resetAt
  }
}

response

{
    "data": {
        "viewer": {
            "login": "<redacted>"
        },
        "rateLimit": {
            "limit": 5000,
            "remaining": 4999,
            "used": 1,
            "resetAt": "2024-05-21T10:17:01Z"
        }
    }
}

I don't know which query does dev-lake uses but I can only guess that one of the fetched parameters was introduced on a version of github newer than 3.11.9. In any case, my proposal above still stands. If you point me in the right direction I can definitely give it a try :)

Do you run this query

query {
  viewer {
    login
  }
  rateLimit {
    limit
    remaining
    used
    resetAt
  }
}
``` on [github](https://docs.github.com/en/graphql/overview/explorer) or  your github enterpise version?

From the log 

run task failed with panic\n\tcaused by: run task failed with panic (github.com/apache/incubator-devlake/helpers/pluginhelper/api.CreateAsyncGraphqlClient:71)\n\tWraps: (2) non-200 OK status code: 404 Not Found body: \"{\\"message\\":\\"Not Found\\",\\"documentation_url\\":\\"https://docs.github.com/enterprise-server@3.11/rest\\\"}\"\n\tWraps: (3) non-200 OK status code: 404 Not Found body: \"{\\"message\\":\\"Not Found\\",\\"documentation_url\\":\\"https://docs.github.com/enterprise-server@3.11/rest\\\"}\"\n\tError types: (1) hintdetail.withDetail (2) hintdetail.withDetail (3) *errors.errorString" ``

I think it's your github version doesn't support to query rate limit info. You can have a try.

And btw, DevLake's query is:

query {
  rateLimit {
    limit
    remaining
    resetAt
  }
}
carocad commented 6 months ago

Do you run this query


query {
  viewer {
    login
  }
  rateLimit {
    limit
    remaining
    used
    resetAt
  }
}
``` on [github](https://docs.github.com/en/graphql/overview/explorer) or  your github enterpise version?

On my github enterprise version of course :)

image

I think it's your github version doesn't support to query rate limit info. You can have a try.

And btw, DevLake's query is: ...

Yeah, it works on my github enterprise version. That query seems like a subset of the one I posted above on https://github.com/apache/incubator-devlake/issues/7435#issuecomment-2122171826 (see response section).

carocad commented 6 months ago

@klesh we continued to follow up on this and now I am 99% sure that this is an issue on devlake. The issue is that for Github Servers it is required to set the endpoint URL. This generally ends in /api/v3/{{suffix}} as per Rest guide of Github. Unfortunately the graphql endpoint doesn't follow this convention. So its endpoint is /api/graphql (notice the missing /v3/).

This line here. Assumes that the graphql suffix can be added to previously defined endpoint URL resulting in /api/v3/graphql which doesn't exists and therefore results in a 404 error --> panic. Since graphql doesn't follow the convention of suffix, simply harcoding it to /api/graphql should solve the issue. Please let me know if you would like me to make a PR for it or if you prefer to do it yourselves. Either way works for me :)

image
klesh commented 6 months ago

Try removing v3/ suffix from the endpoint and see if it solves your problem, e.g. https://github.boschdevcloud.com/api/

carocad commented 6 months ago

Try removing v3/ suffix from the endpoint and see if it solves your problem, e.g. https://github.boschdevcloud.com/api/

It seems that some other functionality relies on the /v3 suffix. Changing the connection endpoint fails to validate the token. Even ignoring the error still leads to an error while collecting/analysing the data. See logs below

image
time="2024-05-28 00:00:02" level=info msg=" [pipeline service] [pipeline #211] [task #1411] start executing task: 1411"
time="2024-05-28 00:00:03" level=info msg=" [pipeline service] [pipeline #211] [task #1411] start plugin"
time="2024-05-28 00:00:03" level=info msg=" [pipeline service] [pipeline #211] [task #1411] [api async client] creating scheduler for api \"https://github.boschdevcloud.com/api/\", number of workers: 6, 4500 reqs / 1h0m0s (interval: 800ms)"
time="2024-05-28 00:00:03" level=info msg=" [pipeline service] [pipeline #211] [task #1411] github graphql init success with remaining 5000/5000 and will reset at 2024-05-28 01:00:03 +0000 UTC"
time="2024-05-28 00:00:03" level=info msg=" [pipeline service] [pipeline #211] [task #1411] total step: 37"
time="2024-05-28 00:00:03" level=info msg=" [pipeline service] [pipeline #211] [task #1411] executing subtask Collect Milestones"
time="2024-05-28 00:00:03" level=info msg=" [pipeline service] [pipeline #211] [task #1411] [Collect Milestones] start api collection"
time="2024-05-28 00:00:03" level=error msg=" [pipeline service] [pipeline #211] [task #1411] [Collect Milestones] end api collection error\n\tcaused by: error parsing response from repos/eBike/devops-launch-assist/milestones (200)"
time="2024-05-28 00:00:03" level=error msg=" [pipeline service] [pipeline #211] [task #1411] subtask Collect Milestones ended unexpectedly\n\tWraps: (2) Error waiting for async Collector execution\n\tWraps: (3) error parsing response from repos/eBike/devops-launch-assist/milestones (200)\n\tError types: (1) *hintdetail.withDetail (2) *hintdetail.withDetail (3) *errors.errorString"
klesh commented 6 months ago

@carocad You are right, it appears that the URL patterns are different between Cloud and On-premise...

However, it is hard to tell whether it is caused by the implementation of the On-premise version or something specific to your configuration.

It is hard for us to fix the problem without a working On-premise instance, would you like to work on the issue and fix it for others?

carocad commented 6 months ago

It is hard for us to fix the problem without a working On-premise instance, would you like to work on the issue and fix it for others?

Sure thing, I opened a PR for it. I tested it on the Github Enterprise instance we use and it worked fine. I was not able to find any tests for the code though. I assume that as you mentioned, there are none since it would require both a github.com and github enterprise organization.