apache / incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
https://devlake.apache.org/
Apache License 2.0
2.59k stars 522 forks source link

[Question]How do I modify this so that it records file names that were updated during git commits/pull requests #7893

Closed amokkapati closed 2 weeks ago

amokkapati commented 2 months ago

[ [ { "plugin": "org", "subtasks": [ "setProjectMapping" ], "options": { "projectMappings": [ { "projectName": "GHEDev", "scopes": [ { "rowId": "github:GithubRepo:1:7345", "table": "repos" }, { "rowId": "github:GithubRepo:1:7345", "table": "cicd_scopes" }, { "rowId": "github:GithubRepo:1:7345", "table": "boards" } ] } ] } }, { "plugin": "github", "subtasks": [ "Convert Releases", "Convert Repos", "Extract Comments", "Extract Events", "Extract Issues", "Extract Milestones", "Extract PR Review Comments", "Extract Pull Requests", "Extract Users", "Extract Workflow Runs", "Convert Issue Assignees", "Convert Issue Comments", "Convert Issue Labels", "Convert Issues", "Convert Milestones", "Convert PR Labels", "Convert Pull Requests", "Convert Workflow Runs", "Enrich PR Issues", "Convert PR Issues", "Extract Jobs", "Extract PR Commits", "Extract PR Reviews", "Convert Jobs", "Convert PR Comments", "Convert PR Commits", "Convert PR Reviews", "Extract User Org", "Convert Users" ], "options": { "connectionId": 1, "fullName": "INFServices/terraform-base", "githubId": 7345, "name": "INFServices/terraform-base" } }, { "plugin": "dora", "subtasks": [ "generateDeployments", "generateDeploymentCommits", "enrichPrevSuccessDeploymentCommits" ], "options": { "projectName": "GHEDev" } }, { "plugin": "refdiff", "subtasks": [ "calculateDeploymentCommitsDiff" ], "options": { "projectName": "GHEDev" } }, { "plugin": "dora", "subtasks": [ "calculateChangeLeadTime", "ConnectIncidentToDeployment" ], "options": { "projectName": "GHEDev" } } ] ]

dosubot[bot] commented 2 months ago

To modify the configuration so that it records file names that were updated during git commits and pull requests, you need to ensure that the data collection includes the necessary entities related to commits and pull requests. Specifically, you should ensure that the Source Code Management and Code Review data entities are selected, as these will include information about commits and pull requests.

Here is the relevant section from the configuration documentation that you need to focus on:

### Data Entities

Usually, you don't have to modify this part. However, if you don't want to collect certain GitHub entities, you can unselect some entities to accelerate the collection speed.

- Issue Tracking: GitHub issues, issue comments, issue labels, etc.
- Source Code Management: GitHub repos, refs, commits, etc.
- Code Review: GitHub PRs, PR comments and reviews, etc.
- CI/CD: GitHub Workflow runs, GitHub Workflow jobs, etc.
- Cross Domain: GitHub accounts, etc.

Ensure that Source Code Management and Code Review are selected to collect data about commits and pull requests, which will include the file names that were updated.

Additionally, you may need to configure the transformation rules to parse the necessary information from the collected data. Here is how you can add transformation rules:

## Step 3 - Adding Transformation Rules (Optional)

![github-add-transformation-rules-list](images/github-set-transformation1.png)
![github-add-transformation-rules](images/github-set-transformation2.png)

Without adding transformation rules, you can still view the "[GitHub Metrics](/livedemo/DataSources/GitHub)" dashboard. However, if you want to view "[Weekly Bug Retro](/livedemo/EngineeringLeads/WeeklyBugRetro)", "[Weekly Community Retro](/livedemo/OSSMaintainers/WeeklyCommunityRetro)" or other pre-built dashboards, the following transformation rules, especially "Type/Bug", should be added.

Each GitHub repo has at most ONE set of transformation rules.
### Code Review

- Type: The `type` of pull requests will be parsed from PR labels by given regular expression. For example:

  - when your labels for PR types are like 'type/feature-development', 'type/bug-fixing' and 'type/docs', please input 'type/(.\*)$'
  - when your labels for PR types are like 'feature-development', 'bug-fixing' and 'docs', please input '(feature-development|bug-fixing|docs)$'

- Component: The `component` of pull requests will be parsed from PR labels by given regular expression.

By ensuring the correct data entities are selected and configuring the transformation rules, you can collect and record the file names that were updated during git commits and pull requests [1][2].

To continue talking to Dosu, mention @dosu.

amokkapati commented 2 months ago

@dosu [ [ { "plugin": "org", "subtasks": [ "setProjectMapping" ], "options": { "projectMappings": [ { "projectName": "GHEDev", "scopes": [ { "rowId": "github:GithubRepo:1:7345", "table": "repos" }, { "rowId": "github:GithubRepo:1:7345", "table": "cicd_scopes" }, { "rowId": "github:GithubRepo:1:7345", "table": "boards" } ] } ] } }, { "plugin": "github", "subtasks": [ "Convert Releases", "Convert Repos", "Extract Comments", "Extract Events", "Extract Issues", "Extract Milestones", "Extract PR Review Comments", "Extract Pull Requests", "Extract Users", "Extract Workflow Runs", "Convert Issue Assignees", "Convert Issue Comments", "Convert Issue Labels", "Convert Issues", "Convert Milestones", "Convert PR Labels", "Convert Pull Requests", "Convert Workflow Runs", "Enrich PR Issues", "Convert PR Issues", "Extract Jobs", "Extract PR Commits", "Extract PR Reviews", "Convert Jobs", "Convert PR Comments", "Convert PR Commits", "Convert PR Reviews", "Extract User Org", "Convert Users" ], "options": { "connectionId": 1, "fullName": "INFServices/terraform-base", "githubId": 7345, "name": "INFServices/terraform-base" } }, { "plugin": "dora", "subtasks": [ "generateDeployments", "generateDeploymentCommits", "enrichPrevSuccessDeploymentCommits" ], "options": { "projectName": "GHEDev" } }, { "plugin": "refdiff", "subtasks": [ "calculateDeploymentCommitsDiff" ], "options": { "projectName": "GHEDev" } }, { "plugin": "dora", "subtasks": [ "calculateChangeLeadTime", "ConnectIncidentToDeployment" ], "options": { "projectName": "GHEDev" } }, { "plugin": "gitextractor", "subtasks": null, "options": { "repoId": "github:GithubRepo:7345", "url": "https://github.com/apache/incubator-devlake.git" } } ] ]

Where in here do I set the "SkipCommitFiles" section to false for gitextractor

dosubot[bot] commented 2 months ago

To set the "SkipCommitFiles" option to false for the "gitextractor" plugin, you need to modify the options section for the gitextractor plugin in your configuration. Here is how you can do it:

[
  {
    "plugin": "org",
    "subtasks": [
      "setProjectMapping"
    ],
    "options": {
      "projectMappings": [
        {
          "projectName": "GHEDev",
          "scopes": [
            {
              "rowId": "github:GithubRepo:1:7345",
              "table": "repos"
            },
            {
              "rowId": "github:GithubRepo:1:7345",
              "table": "cicd_scopes"
            },
            {
              "rowId": "github:GithubRepo:1:7345",
              "table": "boards"
            }
          ]
        }
      ]
    }
  },
  {
    "plugin": "github",
    "subtasks": [
      "Convert Releases",
      "Convert Repos",
      "Extract Comments",
      "Extract Events",
      "Extract Issues",
      "Extract Milestones",
      "Extract PR Review Comments",
      "Extract Pull Requests",
      "Extract Users",
      "Extract Workflow Runs",
      "Convert Issue Assignees",
      "Convert Issue Comments",
      "Convert Issue Labels",
      "Convert Issues",
      "Convert Milestones",
      "Convert PR Labels",
      "Convert Pull Requests",
      "Convert Workflow Runs",
      "Enrich PR Issues",
      "Convert PR Issues",
      "Extract Jobs",
      "Extract PR Commits",
      "Extract PR Reviews",
      "Convert Jobs",
      "Convert PR Comments",
      "Convert PR Commits",
      "Convert PR Reviews",
      "Extract User Org",
      "Convert Users"
    ],
    "options": {
      "connectionId": 1,
      "fullName": "INFServices/terraform-base",
      "githubId": 7345,
      "name": "INFServices/terraform-base"
    }
  },
  {
    "plugin": "dora",
    "subtasks": [
      "generateDeployments",
      "generateDeploymentCommits",
      "enrichPrevSuccessDeploymentCommits"
    ],
    "options": {
      "projectName": "GHEDev"
    }
  },
  {
    "plugin": "refdiff",
    "subtasks": [
      "calculateDeploymentCommitsDiff"
    ],
    "options": {
      "projectName": "GHEDev"
    }
  },
  {
    "plugin": "dora",
    "subtasks": [
      "calculateChangeLeadTime",
      "ConnectIncidentToDeployment"
    ],
    "options": {
      "projectName": "GHEDev"
    }
  },
  {
    "plugin": "gitextractor",
    "subtasks": null,
    "options": {
      "repoId": "github:GithubRepo:7345",
      "url": "https://github.com/apache/incubator-devlake.git",
      "skipCommitFiles": false
    }
  }
]

In this configuration, the skipCommitFiles option is set to false for the gitextractor plugin, ensuring that it records file names that were updated during git commits and pull requests [1][2].

To continue talking to Dosu, mention @dosu.

amokkapati commented 2 months ago

@dosu subtask Clone Git Repo ended unexpectedly Wraps: (2) git exit error Error types: (1) hintdetail.withDetail (2) errors.errorString this is the error that I get when I try to add my own repo under url.

dosubot[bot] commented 2 months ago

To resolve the error "subtask Clone Git Repo ended unexpectedly Wraps: (2) git exit error Error types: (1) hintdetail.withDetail (2) errors.errorString" and successfully add your repository, follow these steps:

  1. Check Proxy Settings: Ensure that the proxy settings are correctly configured if you are using a proxy. Verify that the taskData.Options.Proxy is set correctly.

  2. Verify Private Key Handling: If you are using SSH with a private key, ensure that the private key and passphrase are correctly handled. The code creates a temporary file for the private key and sets the appropriate permissions. Ensure that this process is not failing.

  3. Check Git Command Execution: Ensure that the execGitCommand and execGitCommandIn functions are correctly handling the git commands and their environment. Verify that the commands are correctly formed and that the environment variables are set properly.

  4. Handle Shallow Clone Issues: Ensure that the --depth=1 and --shallow-since options are used correctly if you are performing a shallow clone.

Additionally, ensure that the CloneGitRepo function in the CloneGitRepoMeta subtask is correctly implemented. Here is the relevant part of the code:

func CloneGitRepo(subTaskCtx plugin.SubTaskContext) errors.Error {
    taskData, ok := subTaskCtx.GetData().(*parser.GitExtractorTaskData)
    if (!ok) {
        panic("git repo reference not found on context")
    }
    op := taskData.Options
    storage := store.NewDatabase(subTaskCtx, op.RepoId)
    var err errors.Error
    logger := subTaskCtx.GetLogger()

    // temporary dir for cloning
    localDir, e := os.MkdirTemp("", "gitextractor")
    if e != nil {
        return errors.Convert(e)
    }

    // clone repo
    repoCloner := parser.NewGitcliCloner(subTaskCtx)
    err = repoCloner.CloneRepo(subTaskCtx, localDir)
    if err != nil {
        if errors.Is(err, parser.ErrNoData) {
            taskData.SkipAllSubtasks = true
            return nil
        }
        return err
    }
    if repoCloner.IsIncremental() {
        storage.SetIncrementalMode(repoCloner.IsIncremental())
    }
    // We have done comparison experiments for git2go and go-git, and the results show that git2go has better performance.
    var repoCollector parser.RepoCollector
    if *taskData.Options.UseGoGit {
        repoCollector, err = parser.NewGogitRepoCollector(localDir, op.RepoId, storage, logger)
    } else {
        repoCollector, err = parser.NewLibgit2RepoCollector(localDir, op.RepoId, storage, logger)
    }
    if err != nil {
        return err
    }

    // inject clean up callback to remove the cloned dir
    cleanup := func() {
        _ = os.RemoveAll(localDir)
    }
    if e := repoCollector.SetCleanUp(cleanup); e != nil {
        return errors.Convert(e)
    }

    // pass the collector down to next subtask
    taskData.GitRepo = repoCollector
    subTaskCtx.TaskContext().SetData(taskData)
    return nil
}

Additional Troubleshooting Steps

  1. Self-Signed Certificates: If you are using a private GitLab server with a self-signed certificate, add your root CA to the devlake container. Mount your rootCA.crt into the container and update the CA certificates.

    Example for docker-compose:

    devlake:
      image: apache/devlake:v...
      ...
      volumes:
        ...
        - /path/to/your/rootCA.crt:/usr/local/share/ca-certificates/rootCA.crt
      command: [ "sh", "-c", "update-ca-certificates; lake" ]
      ...
  2. Upgrade to Latest Version: Consider upgrading to the latest version of DevLake (v0.15.x or later) as some bugs have been fixed in newer versions.

  3. Check Token Permissions: Ensure that your access token has the necessary permissions. If your account is protected by organization SAML enforcement, make sure you've authorized the token using SSO.

  4. Error Code 403: If you encounter an error with the status code 403, it is usually caused by the permission of your tokens. Ensure you are using a supported authentication method and that your token has the necessary permissions.

  5. Error Code 429: If you encounter an error with the status code 429, it is usually caused by rate limiting. Enable fixed rate limit in the GitLab connection and lower the API rates to 2,000. If it works, you can try increasing the rates to accelerate [1][2][3][4].

To continue talking to Dosu, mention @dosu.

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] commented 2 weeks ago

This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.