apache / incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
https://devlake.apache.org/
Apache License 2.0
2.52k stars 502 forks source link

[Bug][github data source] Data is not populating in pull_requests table #7491

Open Rbathular opened 2 months ago

Rbathular commented 2 months ago

Search before asking

What happened

I have configured github data source connection and triggered the collect data pipeline. Pipeline is success but after that I'm not able to see the data in pull_requests table . In grafana dashboard also I'm not seeing any pull requests related data . All the github metrics it's giving no data

What do you expect to happen

I should be able to see data in contribution (PRs) , How PRsare Handled metrics

How to reproduce

  1. create github data connection
  2. add data scope
  3. create project and add data connection
  4. click on collect data .
  5. once pipeline is success check the grafana dashboard .
  6. check the grafana dashboard .

Anything else

No response

Version

v1.0.0-beta7@2cc08c7

Are you willing to submit PR?

Code of Conduct

dosubot[bot] commented 2 months ago

To address the issue with the pull_requests table not populating, please provide the following details:

  1. Logs from the data collection pipeline to identify any potential issues during data collection.
  2. Configuration details of your GitHub data source connection, ensuring any sensitive information is redacted.
  3. Your Grafana version and any error messages in the Grafana logs.

This information is crucial for diagnosing the problem.

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Rbathular commented 2 months ago
  1. logs from data connection pipeline logging.tar-3.gz
  2. please see the below attachment for configuration details Screenshot 2024-05-20 at 6 49 27 PM Screenshot 2024-05-20 at 6 49 36 PM
  3. Shared the grafana logs : [devlake-qa-grafana-7b88dfbd8-l72ch-grafana.log](https://github.com/apache/incubator-devlake/files/15377527/devlake-qa-grafana-7b88dfbd8-l72ch-grafana.log
dosubot[bot] commented 2 months ago

After reviewing the provided logs and configuration details, it seems there might be a few areas to investigate to resolve the issue with the pull_requests table not populating and the Grafana dashboard not displaying pull requests data.

  1. Check for Pull Requests in the Repository: Ensure that the GitHub repository you're collecting data from contains pull requests. If there are no pull requests, the pull_requests table will remain empty, and Grafana will not display any related data.

  2. Review Data Collection Pipeline Logs: The logs from the data collection pipeline could reveal potential issues or errors during the data collection process. Look for any error messages or warnings that might indicate why the data isn't being collected as expected.

  3. Verify GitHub Data Source Configuration: Double-check the configuration details of your GitHub data source connection. Ensure that all the information is correct and that no sensitive information like personal access tokens is exposed. Incorrect configuration could lead to data collection issues.

  4. Examine Grafana Logs: The Grafana logs you shared might contain error messages or warnings that can help identify why the data is not being displayed. Look for any indications of dashboard errors or data retrieval problems.

  5. Configuration and Connection Details: The images you provided for the GitHub data source connection and the "Connections / FIS-Drive" section should be reviewed to ensure that the setup is correctly configured to collect and display the desired data.

By addressing these areas, you should be able to identify and resolve the issue with the pull_requests table and the Grafana dashboard. If you continue to experience problems, providing more specific details from the logs or configuration settings can help further diagnose the issue.

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Rbathular commented 2 months ago

@dosu I have verified all the points which you provided . All the details are proper , still not able to see the pull_requests metrics.

Note : Previously we are using v0.19.1-beta3@ version devlake and with v0.19.1-beta3@ we are able to get pull_request data metrics in grafana , Today we installed devlake newly with version v1.0.0-beta7@2cc08c7 . With this v1.0.0-beta7@2cc08c7 version we are not able to get pull_request data . Jira and Sonarqube data is populating as expected .

dosubot[bot] commented 2 months ago

Given the upgrade to DevLake version v1.0.0-beta7@2cc08c7 and the issues you're facing with pull request metrics not populating in Grafana, it's crucial to narrow down the cause. Since Jira and SonarQube data are populating as expected, the issue seems isolated to GitHub data collection or processing. Here are a few targeted steps to help diagnose and potentially resolve the issue:

  1. Grafana Dashboard Configuration: Please share the specific configuration of the Grafana dashboard you're using, focusing on the panel queries and settings. This will help identify if the issue lies within the dashboard setup.

  2. DevLake Logs: Look for any specific error messages or warnings in the DevLake logs that mention issues with GitHub data collection or processing. These logs can provide clues on whether the data collection from GitHub is failing or if the issue occurs during data processing.

  3. Repository-Specific Issues: Confirm if this issue is occurring across all GitHub repositories or if it's isolated to a specific repository. This can help determine if the problem is with the data source configuration or with how data from certain repositories is handled.

  4. Configuration Changes: Since the issue appeared after upgrading to a newer version of DevLake, please review any changes made to the default settings or configurations in both DevLake and Grafana. It's possible that the upgrade process may have altered a setting that affects the visibility of GitHub pull request data.

By focusing on these areas, we aim to pinpoint the cause of the issue more effectively. Your cooperation in providing these details will be invaluable in resolving the problem.

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

klesh commented 2 months ago

How about the _tool_github_pull_requests table? There is a Time After under the Sync policy, what was the value of it? Are you sure that there is PR in the period?

Rbathular commented 2 months ago

@klesh Added the sync policy details . please see

Screenshot 2024-05-21 at 11 47 24 AM
Rbathular commented 2 months ago

@klesh Defenetily Pr's available in that period

d4x1 commented 2 months ago

@klesh Added the sync policy details . please see Screenshot 2024-05-21 at 11 47 24 AM

Thank you for your screenshot. I think @klesh wants you to help to confirm whether theres are some records in table _tool_github_pull_requests or not.

With the addtional confirmation, we can dive into this issue, and see what's wrong.

I have searched your log, there is an error

 [pipeline service] [pipeline #7] [task #67] [Clone Git Repo] [gitcli] git exited with error\nstderr: Cloning into bare repository '/tmp/gitextractor83455746'...\nstderr: fatal: error processing shallow info: 4\n\n\tcaused by: exit status 128"

This error occurs randomly. @Rbathular You can try to re run this pipeline, until there is no errors in config ui.(If there are errors in log, the config ui will show failed tasks.)

And, at the same time, @klesh Can you help to confirm whether it's the root cause ?

Rbathular commented 2 months ago

@d4x1 One update from my side : yesterday I configured 2 projects in devlake config ui . Project1 : it has 2 repositories --> I'm not able to see any pull requests related data in github metrics in grafana dashboard. project 2 : it has 2 repositories --> yesterday I triggerd the pipelines but yesterday I'm not able to see any data yesterday . Today as part of sync policy one more pipeline triggered and after that able to see one repository pull requests data and data also not consistent. 2nd repository data still not able to see .

I have added both project logs . Could you please help on this project1.gz project2.gz

can you please expedite to resolve the issue the ticket as this is blocking us to getting the github metrics

klesh commented 2 months ago

The log shows that no PR in the time range, it also explained fatal: error processing shallow info: because no commits as well. Please make sure you the repo you were collecting has PR created/updated AFTER 2023-11-20. Screenshot would be helpful.

Rbathular commented 2 months ago

@klesh One of the repository added in github connection

Screenshot 2024-05-23 at 7 48 36 PM

and it's github pull request screenshot

Screenshot 2024-05-23 at 7 52 50 PM
Rbathular commented 2 months ago

@klesh strange thing is I'm able to see data for few repositires and for many off the repositories I'm not able to get data in pull_requests table.

Note: The repositories which I added are 1 year old repositories

klesh commented 2 months ago

To clarify, do you mean that you added a bunch of 1-year-old repositories and were able to fetch PRs for a few of them while others were not?

Correct me if I am wrong:

  1. The repo AV-TaaS/vehicle-reservation-ui is alive and active, it has PRs merged recently
  2. A Apache DevLake Project was created with Sync Policy time range starting from 2023-11-20
  3. The "CollectPullRequest" subtask appeared in the pipeline under the task for the repo
  4. No PRs collected for the repo
  5. However, PRs can be collected for newer repos

Can you try deploying a new instance with a brand new database to collect the repo and see if the PRs can be collected.

github-actions[bot] commented 3 days ago

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.