ddicorpo / RecruitInc

2018 - 2019 Capstone Project for Team RecruitInc
Other
1 stars 0 forks source link

Extraction for matching algo - bitbucket #66

Closed BenjaminTanguay closed 6 years ago

BenjaminTanguay commented 6 years ago

Description

As an employee of IBM consulting in charge of recruitment, I want to be able to retrieve the information necessary for the matching algorithm to determine if the applicant worked in React from bitbucket so that I can match him with React job postings.

Risk

Medium

Value

High

Story points

3

General Description

This story is about fetching what is required for the matching algorithm to conclude whether a programmer has worked with react or not, and how much he coded in the framework.

The story should allow us to fetch all of the projects and extract the data we need.

It should handle possible errors.

It should output a clean list of data matching the interfaces defined by the matching algorithm team.

How

Acceptance Criteria

Test Description Step # Expected Output Actual Output Pass/Fail
Log into bitbucket via frontend 1 Access key for oauth is passed to the backend Access key is passed Pass
Query for repositories from user 2 Repositories are retrieved Repositories are retrieved Pass
Query for commits associated with the user from a repository 3 Commits are retrieved Commits are retrieved Pass
Query for diffstats of a given commit 4 Diffstats of a given commit are retrieved Diffstats are retrieved Pass
Clean data and store it appropriately 5 Data has been stored in appropriate objects and is ready to be handled by the matching algo Data outputted in raw form to console Pass
BenjaminTanguay commented 6 years ago

Input we want to have for the matching algorithm:

class Input {
    -projectInput: ProjectInput[]
}

class ProjectInput {
    -projectName: String
    -applicantCommits: Commit[]
    -projectStructure: ProjectStructure[]
    //gitignore, package.json, etc.
    -downloadedSourceFilePaths: String[]
}

class Commit {
    -id: String
    -numberOfFilesAffected: int // Derived from array below
    -files: SingeFileCommit[]
}

class SingleFileCommit {
    -filePath: String
    -linesAdded: int
    -linesDeleted: int
}

class ProjectStructure {
    fileId: String,         // Allow us to download
    fileName: String,       // Allow us to parse extensions
    filePath: String        // Allow us to cross-check commits
}

Use this as your target.

23jams commented 6 years ago

Step1: Get list of repositories for a specific user Step2: Store the repository info in a data structure Step3: Create a method to loop through the repos and grab all the commits from a specific user Step4: Store commits in a data structure Step5: Loop through the commits and get the diff stats for each commit Step6: Store the diff stats in a data structure

DIff stats include the lines added and removed, as well as file path.

23jams commented 6 years ago

Calling for repositories returns an array of characters, rather than a paginated list. We wrote a pretty dirty way of pulling the repository name from it and it works. The repository name can be used to pull further information.

Winterhart commented 6 years ago

Hey, it would be nice to reuse common components from issues #66 #64 and #65

We can use the pipe-and-filter pattern in a refactor task: https://github.com/mspnp/architecture-center/blob/master/docs/patterns/pipes-and-filters.md

23jams commented 6 years ago

Pagination for bitbucket explained here https://developer.atlassian.com/server/confluence/pagination-in-the-rest-api/ keep this in mind for when we update it to gather all.

23jams commented 6 years ago

One major limitation of bitbucket that was discovered is that the API will only pull repositories that the user is an owner of. Therefore contributions a user makes to a repo they are not the owner of will not be considered. This is an API issue, a potential course of action would be to notify users that if they want that work to be considered they will need to import a clone of the project to their bitbucket with them as owner.