CSV file generate - Githubissues

azhe825 / CSC510

Course Project for CSC 510, 2016 spring

1 stars 3 forks source link

CSV file generate #65

Open azhe825 opened 8 years ago

azhe825 commented 8 years ago

First Step, extract following data:

Each group:

activity for user i (rule out timm and TA) (timm is user19)
activity for label m (number of issues having label m, total number of actions of issues having label m)
activity for milestone k (same as label)
activity for issue n
activity for day j

The above information should be extracted for each month. Jan 2st to Feb 1st, Feb 2nd to Mat 1st, Mat 2nd to April 1st, April 2nd to May 1st.

Second Step, statistics on the above extracted data, make some pretty graphs

Third Step, detect badsmells form above graphs, transform it into features.

Fourth Step, train a model on the features and try to predict delayed projects.

dichen001 commented 8 years ago

each row: < group ID, user ID, Date, Issue (1 or 0), I Action, I Assignee, Issue (???),..., Milestone (1 or 0), M Action, M Related Issues, Milestone (???)... Commit (1 or 0), C Line Num, Commit (???) Comment (1 or 0), Comment Line Num, Comment (???)>

dichen001 commented 8 years ago

For these time mentioned above, what are the numbers in seconds (the time represented in the csv files) accordingly? Jan 2st to Feb 1st, Feb 2nd to Mat 1st, Mat 2nd to April 1st, April 2nd to May 1st.

jerry-shijieli commented 8 years ago

Usually that time counts from the year 1970. You can google database time expression.

jerry-shijieli commented 8 years ago

To convert time in seconds to date time in python, check: http://stackoverflow.com/questions/3694487/python-initialize-a-datetime-object-with-seconds-since-epoch

dichen001 commented 8 years ago

Thanks!

dichen001 commented 8 years ago

Use this dict to divide the features into month time range: {'jan': 1451606400, 'feb': 1454284800, 'mar': 1456790400, 'apr': 1459468800, 'mar': 1462060800}

each number above is the first day of that month

azhe825 commented 8 years ago

March 3rd, 1457049600 for early detection April 7th, 1460073600 for overall badsmell