Closed klesh closed 2 years ago
I'll check it.
GitHub GraphQL rate limit is 5,000 points per hour. Its effection is about equal to 5000*100 restful requests. 😁
GitHub GraphQL rate limit is 5,000 points per hour. Its effection is about equal to 5000*100 restful requests. 😁
Can you also provide the relevant link?
I'll write a small demo to show how fast graphql is.
https://docs.github.com/en/graphql/overview/explorer
{
rateLimit {
limit
cost
remaining
resetAt
}
repository(name: "incubator-devlake", owner: "apache") {
issues(first: 30, after: "Y3Vyc29yOnYyOpHOOKUPkw==") {
totalCount
nodes {
number
labels(first: 100) {
totalCount
nodes {
name
}
pageInfo {
endCursor
hasNextPage
}
}
milestone {
number
title
}
comments(first: 100) {
nodes {
author {
login
... on User {
email
databaseId
login
url
websiteUrl
}
}
databaseId
bodyText
createdAt
updatedAt
}
pageInfo {
endCursor
hasNextPage
}
}
author {
login
... on User {
email
databaseId
}
}
assignees(first: 100) {
pageInfo {
hasNextPage
endCursor
}
totalCount
nodes {
login
databaseId
}
}
body
closedAt
title
state
stateReason
url
updatedAt
createdAt
}
pageInfo {
endCursor
hasNextPage
}
}
}
}
This query selects all necessary data(label assignee comments) for issues. About 30 issues cost 1 point in rate limit.
After exploring GraphQL, I found that it is indeed a little faster than restful. The main reason is that GitHub Collector does not have many fine things. Only pr's commits/reviewers will reduce the number of requests because they can be requested in pr. The rateLimit is also. The list of issues or others is the same between these 2 ways. but the pr's commits/reviewers are a bit larger because of the combined requests. Also found a major reason for GitHub speed, GitHub is allowing such 5000 times an hour, which can be all used up in the first minute. But our strategy is to divide the quota into each second and use it slowly.
Translated with www.DeepL.com/Translator (free version)
@hezyin @CamilleTeruel The investigation of github graphql shows that it is a promising direction:
Based on information from @likyh, we decided to expand the investigation scope from github
, to other data sources as well, to find out the availability of graphql among different data sources, we may consider bringing in graphql if more than one data sources are supporting it, and share the similar features (in terms of higher rate limit and multiple/nested resources support).
However, the github graphql rate limit calculation is quite complex, so it may not be possible to be converted to a steady rate. Although the Lazy RateLimit Strategy could be applied here, it leads to a UX complication. We need input from @Startrekzky and @yumengwang03 .
@likyh suggested we should adopt a Dynamic Rate Control Algo like Binary Back Off sth in that nature. I'm afraid this would make the Plugin Interface further complex since rate limit information is data-source-specific. Please take these factors into account for the Plugin Development Improvement plan.
rate info in some platform: https://docs.snyk.io/features/other-tools/snyk-scm-contributors-count-cli-tool/api-rate-limit-control
@klesh Thanks for the summary.
I agree it's important to make sure users understand sometimes pipelines may be stale due to rate limit, but I don't think it's a fundamental blocker. If we decide to go that route, I'm sure @Startrekzky and @yumengwang03 can find a way to communicate. Let's evaluate the feasibility based on other factors like speed gain, implementation cost, maintenance cost, and etc.
All products of Atlassian, such as Jira, Confluence, and Bitbucket, have the same GraphQl API. It has 2 endpoints:
It doesn't have definite rate limits as Jira restful API.
bitbucket:
query MyQuery {
diagnostics
bitbucket {
bitbucketWorkspace(
id: "ari:cloud:bitbucket::workspace/d1762eb7-0305-41b6-be9e-832ad8dcc7d4"
) {
id
name
repositories(first: 10000) {
nodes {
id
name
webUrl
}
}
}
}
}
Jira:
cloudId
by: https://xxxx.atlassian.net/_edge/tenant_info
query MyQuery {
polarisAPIVersion
jira {
issueByKey(cloudId: "b696e399-4a1d-4ef6-a6e8-d4243f3b59f6", key: "XX-1000") {
id
}
}
}
But it failed because graphql is not finished.
{
"errors": [
{
"message": "ISSUE_UNAVAILABLE",
"locations": [
{
"line": 1,
"column": 43
}
],
"path": [
"jira",
"issueByKey"
],
"extensions": {
"errorSource": "UNDERLYING_SERVICE",
"statusCode": 500,
"errorType": "ISSUE_UNAVAILABLE",
"classification": "DataFetchingException"
}
}
],
"data": {
"polarisAPIVersion": "a6adb4f",
"jira": {
"issueByKey": null
}
},
"extensions": {
"gateway": {
"request_id": "e18074d9c51c6639",
"crossRegion": true,
"edgeCrossRegion": false,
"deprecatedFieldsUsed": []
}
}
}
Also, there is 2 questions that graphql is not complete in some APIs and there is no complete document.
https://developer.atlassian.com/platform/atlassian-graphql-api/graphql/#overview
Graphql in GitLab is useful for us. https://gitlab.com/-/graphql-explorer
cannot explore all entities so I suggest filling https://gitlab.com/api/graphql
in graphql tool https://graphiql-online.com/graphiql
to use it.
it's easy to use.
query MyQuery {
project(fullPath: "merico-dev/ee/vdev.co") {
mergeRequests(first: 100, sort: CREATED_ASC) {
nodes {
id
iid
}
pageInfo {
endCursor
hasNextPage
}
totalTimeToMerge
count
}
id
name
}
}
So we can use graphql in GitHub/GitLab and use graphql at a little part in bitbucket. I don't suggest using graphql in Jira.
Notice: raw layer will be insignificant because the response body is determined by tool layer.
Using GraphQL can indeed make full collection much much faster. But we should also keep in mind that it might also reduces our ability to perform incremental collections.
For example in GitHub's GraphQL schema the issues
connection has a since
filter parameter, so no problem here, but the pull_requests
connection does not, and we can only filter PRs by state or label.
So for incremental collection of PRs we have to fetch at least all opened PRs each time. In GitHub case, I guess that we still gain over the long run, it not like the typical project has thousands of opened PRs at a given time after all. But my point is that a query can be incremental only if GraphQL schema provide suitable filtering parameters for the corresponding connections.
Using GraphQL can indeed make full collection much much faster. But we should also keep in mind that it might also reduces our ability to perform incremental collections.
For example in GitHub's GraphQL schema the
issues
connection has asince
filter parameter, so no problem here, but thepull_requests
connection does not, and we can only filter PRs by state or label.So for incremental collection of PRs we have to fetch at least all opened PRs each time. In GitHub case, I guess that we still gain over the long run, it not like the typical project has thousands of opened PRs at a given time after all. But my point is that a query can be incremental only if GraphQL schema provide suitable filtering parameters for the corresponding connections.
Yes. Issue support but PR not support. And maybe collecting all PRs by graphql can be faster than by restful because of fewer requests.
Jira:
Add header: Authorization: Basic XXXXXX
.
Then use this query to request all projects.
Don't use Jira graphql client. use https://graphiql-online.com/graphiql
.
query example {
jira {
allJiraProjects(cloudId: "b696e399-4a1d-4ef6-a6e8-d4243f3b59f6", filter: {sortBy: {sortBy: NAME, order: ASC}}, first: 1) {
pageInfo {
hasNextPage
endCursor
}
edges {
node {
key
name
opsgenieTeamsAvailableToLinkWith {
pageInfo {
hasNextPage
}
edges {
node {
id
name
}
}
}
}
}
}
}
}
got project id: ari:cloud:jira:b696e399-4a1d-4ef6-a6e8-d4243f3b59f6:project/10029
request issue detail:
query example {
jira {
issueByKey(cloudId: "b696e399-4a1d-4ef6-a6e8-d4243f3b59f6", key: "EE-1111") {
id
issueId
key
worklogs {
pageInfo {
endCursor
hasNextPage
}
edges {
node {
created
author {
name
}
id
worklogId
updated
startDate
updateAuthor {
name
}
}
}
}
webUrl
}
}
}
request issue list:
query example {
jira {
issueSearchStable(cloudId: "b696e399-4a1d-4ef6-a6e8-d4243f3b59f6", issueSearchInput: {jql: "project=EE and key='EE-1111'"}) {
pageInfo {
hasNextPage
endCursor
}
totalCount
edges {
node {
id
key
webUrl
worklogs {
edges {
node {
id
startDate
}
}
}
}
}
}
}
}
Seems it can just request workLogs but cannot request changeLogs.
Jira's graphql now requests at most 100 per page which same as restful. So graphql only improve the query loop. Now only account/changeLog/workLog/remoteLink use the query loop. ChangeLog and remoteLink need massive requests but cannot query in graphql. WorkLog and account can query in graphql but they use limited requests. So it's mostly useless to use graphql in jira.
Resolved by #2619
Description
Rumor said that github graphql api has a much higher api request rate allowance, we need to investigate the possibility of adopting for a better performance
Describe the solution you'd like
Has the Feature been Requested Before?
No