DeveloperLiberationFront / ExploringGHTorrent

0 stars 0 forks source link

Notes on Issue #48 - How Resilient Are Males vs Females #7

Open clarrain opened 8 years ago

clarrain commented 8 years ago

Current Plan: Look in table female_first_pulls of Sandbox - match the _pull_requestid with _pull_requestid from table pull_request_history Save the _pull_requestid and _createdat of instances where action "closed" exists Match the _pull_requestid from pull_request_history with _pull_requestid from pull_requests Save the _base_repoid or _head_repoid for each of these _pull_request_id_s from pull_requests Look in pull_request_history for instances where _actorid equals _userid from female_first_pulls after the _createdat date saved from _pull_requestid For the above instances, save the _pull_requestid Go to pull_requests and see if this _pull_requestid has the same head/base repo id. If yes, count the instances of this match

clarrain commented 8 years ago

Make a new table instead of relying on female_first_pulls - start w/ g_pull_closes in Sandbox

CaptainEmerson commented 8 years ago

You might consider using views or temporary tables. Views can be treated like tables, but they're actually just queries. Temporary tables are like normal tables, but disappear after you disconnect your MySQL session.

CaptainEmerson commented 8 years ago

You'd want to use a temporary table instead of a view when the view would take a long time (say, more than a few minutes) to execute.

clarrain commented 8 years ago

Okay, thanks! I was planning on using views for sure, and I'll look into instances where using a temp table would be better suited

clarrain commented 8 years ago

pull_request_history not accurate - re-write plan using g_pull_closes

clarrain commented 8 years ago

I'm changing up how I want to go about finding this out - instead of looking at the statistics of a pull request creator for a specific project, I'm doing to look at the general statistics of a creator - after their first "closed" pull request, do they make any more requests (how many? What is the state of their final pull request - closed or open?) for each gender. After working with Denae on the revised plan, I realized that it would be nearly impossible to limit the stats to one specific project with joins using only the data that g_pull_closes has.

CaptainEmerson commented 8 years ago

Please make your technical notes on issue #48 itself. This repository is public, the other is private. We certainly can't post any data here, and probably don't want to post metadata (e.g., our tables) either.

clarrain commented 8 years ago

Got it!