Open azhe825 opened 8 years ago
each row: < group ID, user ID, Date, Issue (1 or 0), I Action, I Assignee, Issue (???),..., Milestone (1 or 0), M Action, M Related Issues, Milestone (???)... Commit (1 or 0), C Line Num, Commit (???) Comment (1 or 0), Comment Line Num, Comment (???)>
For these time mentioned above, what are the numbers in seconds (the time represented in the csv files) accordingly? Jan 2st to Feb 1st, Feb 2nd to Mat 1st, Mat 2nd to April 1st, April 2nd to May 1st.
Usually that time counts from the year 1970. You can google database time expression.
To convert time in seconds to date time in python, check: http://stackoverflow.com/questions/3694487/python-initialize-a-datetime-object-with-seconds-since-epoch
Thanks!
Use this dict to divide the features into month time range:
{'jan': 1451606400, 'feb': 1454284800, 'mar': 1456790400, 'apr': 1459468800, 'mar': 1462060800}
each number above is the first day of that month
March 3rd, 1457049600 for early detection April 7th, 1460073600 for overall badsmell
First Step, extract following data:
Each group:
The above information should be extracted for each month. Jan 2st to Feb 1st, Feb 2nd to Mat 1st, Mat 2nd to April 1st, April 2nd to May 1st.
Second Step, statistics on the above extracted data, make some pretty graphs
Third Step, detect badsmells form above graphs, transform it into features.
Fourth Step, train a model on the features and try to predict delayed projects.