Extraction for matching algo - bitbucket

BenjaminTanguay commented 6 years ago

Description

As an employee of IBM consulting in charge of recruitment, I want to be able to retrieve the information necessary for the matching algorithm to determine if the applicant worked in React from bitbucket so that I can match him with React job postings.

Risk

Medium

Value

High

Story points

3

General Description

This story is about fetching what is required for the matching algorithm to conclude whether a programmer has worked with react or not, and how much he coded in the framework.

The story should allow us to fetch all of the projects and extract the data we need.

It should handle possible errors.

It should output a clean list of data matching the interfaces defined by the matching algorithm team.

How

[x] Task 1: Research task to see how to make the queries to fetch necessary data

Ideal Hours: 1
[x] Task 2: Code the queries and handle errors

Ideal Hours: 3
[x] Task 3: Clean up the data outputted to fit the interfaces of the matching algorithm

Ideal Hours: 3

Acceptance Criteria

[x] Unit tests
[x] Logging
Acceptance Tests

Test Description	Step #	Expected Output	Actual Output	Pass/Fail
Log into bitbucket via frontend	1	Access key for oauth is passed to the backend	Access key is passed	Pass
Query for repositories from user	2	Repositories are retrieved	Repositories are retrieved	Pass
Query for commits associated with the user from a repository	3	Commits are retrieved	Commits are retrieved	Pass
Query for diffstats of a given commit	4	Diffstats of a given commit are retrieved	Diffstats are retrieved	Pass
Clean data and store it appropriately	5	Data has been stored in appropriate objects and is ready to be handled by the matching algo	Data outputted in raw form to console	Pass

BenjaminTanguay commented 6 years ago

Input we want to have for the matching algorithm:

class Input {
    -projectInput: ProjectInput[]
}

class ProjectInput {
    -projectName: String
    -applicantCommits: Commit[]
    -projectStructure: ProjectStructure[]
    //gitignore, package.json, etc.
    -downloadedSourceFilePaths: String[]
}

class Commit {
    -id: String
    -numberOfFilesAffected: int // Derived from array below
    -files: SingeFileCommit[]
}

class SingleFileCommit {
    -filePath: String
    -linesAdded: int
    -linesDeleted: int
}

class ProjectStructure {
    fileId: String,         // Allow us to download
    fileName: String,       // Allow us to parse extensions
    filePath: String        // Allow us to cross-check commits
}

Use this as your target.

23jams commented 6 years ago

Step1: Get list of repositories for a specific user Step2: Store the repository info in a data structure Step3: Create a method to loop through the repos and grab all the commits from a specific user Step4: Store commits in a data structure Step5: Loop through the commits and get the diff stats for each commit Step6: Store the diff stats in a data structure

DIff stats include the lines added and removed, as well as file path.

23jams commented 6 years ago

Calling for repositories returns an array of characters, rather than a paginated list. We wrote a pretty dirty way of pulling the repository name from it and it works. The repository name can be used to pull further information.

Winterhart commented 6 years ago

Hey, it would be nice to reuse common components from issues #66 #64 and #65

We can use the pipe-and-filter pattern in a refactor task: https://github.com/mspnp/architecture-center/blob/master/docs/patterns/pipes-and-filters.md

23jams commented 6 years ago

Pagination for bitbucket explained here https://developer.atlassian.com/server/confluence/pagination-in-the-rest-api/ keep this in mind for when we update it to gather all.

23jams commented 6 years ago

One major limitation of bitbucket that was discovered is that the API will only pull repositories that the user is an owner of. Therefore contributions a user makes to a repo they are not the owner of will not be considered. This is an API issue, a potential course of action would be to notify users that if they want that work to be considered they will need to import a clone of the project to their bitbucket with them as owner.

ddicorpo / RecruitInc