Feature Info for Interface Teams

johnsc1 commented 4 years ago

This is an issue where the teams working on features can provide a table of their features containing the function name, parameters, and return data so the interface teams are able to integrate the features as soon as possible.

Note: These tables need to be updated as code is refactored so the interfaces are correct.

liux2 commented 4 years ago

Please add as the following format:

Filename: Function name: Required parameters: Returned parameters: If there is no return statement, what does the function do:

koscinskic commented 4 years ago

Filename: data_collection.py Function Name: authenticate_repository Req Params: user_token, repository_name Ret Params: repository (The PyGithub Object)

Filename: data_collection.py Function Name: retrieve_issue_data Req Params: repository (The PyGithub Object), state, contributor_data (dict form) Ret Params: contributor_data (dict form)

noorbuchi commented 4 years ago

The following functions are for the `data_collection.py` file.	Function Name	Input Parameters
collect_commits_hash(repo_path)	Path to repository, String	List of dictionaries of commit info
get_commit_average(lines, commits)	Lines and commits are int type	lines divided by commits, handles for division by zero
parse_for_type(name)	name of the file is a string	splits text and returns file format as string
get_file_formats(files)	list of strings as files names	list of strings of unique files types/formats
add_raw_data_to_json(path_to_repo, json_file_name)	TEMPORARY FUNCTION AND WILL BE REMOVED	No return, writes data to .json file
calculate_individual_metrics(json_file_name)	string of json file name	Nested dictionary of individual metrics
print_individual_in_table(json_file_name)	string of json file name	no output, simply prints the data

Even though there is an additional file that our group worked on called merge_duplicate_usernames, it is currently under rework and it will be completely changed or deleted so we have no updates regarding that file.

johnsc1 commented 4 years ago

@koscinskic @noorbuchi Are there any more features the interface teams need information about or do any of the information regarding these features need to be updated ?

noorbuchi commented 4 years ago

@johnsc1 There are more changes that will be coming soon, I have been working on this all day to get them done and merged to master as soon as possible. I can't guarantee that you'll get a full update with the new stuff today but I will try to get it done as soon as I can.

noorbuchi commented 4 years ago

@johnsc1 There is a small concern that I have that you might have to consider as the command-line interface team. One of the aspects of gathering accurate data requires the user to interact to specify any duplicate usernames to get their data merged together. I have already created a function for that in the individual-metrics-lines PR which I'm currently working on. However, I'm not sure how your team will choose to deal with this issue so I wanted to let you all know. Let me know if you have any questions or there are any requests to change the previously mentioned function.

liux2 commented 4 years ago

Is it possible to create a function that can do auto-merge, and some merge method can be provided to users. If the user doesn't choose to merge, then we do data with duplicates. If the user chooses to do merge, we can call the function with the merge strategy specified.

Our task is to create pipeline with the interface. You guys can just provide functions. @noorbuchi

noorbuchi commented 4 years ago

@liux2 I think not merging duplicate data would really skew the scores assigned to people because of how many duplicates there often is. I definitely thought about making an auto-merge function, the only issue is that there is no way to detect duplicates all the time. I tried to make a function but it often made false assumptions and merged wrong data. Also, it often merged so little that user input was still required. @gkapfham proposed the solution we have right now and I thought it would be the best approach.

johnsc1 commented 4 years ago

@liux2 @noorbuchi Could this be solved by prompting the user in the CLI to enter the duplicate usernames, then passing a list of those usernames to this new feature? We should also try to follow the solution that @gkapfham proposed since he is the customer.

noorbuchi commented 4 years ago

@johnsc1 I'm not sure how the CLI works, but in my approach to testing it in the main function, I printed the data repeatedly to ask the user to input the usernames they want to merge and then displayed the data again and asked them if they're done or if they want to merge other ones. I realize that this is not the best approach but I'm not sure how else to deal with this issue.

lussierc commented 4 years ago

Additionally, I have refactored the Building And Testing Team's first two features in #71. The refactored code appears in the same functions, calculate_individual_metrics(json_file_name) and print_individual_in_table(json_file_name), talked about by @noorbuchi in his graph above. These are in the file data_collection.py.

lussierc commented 4 years ago

For feature teams, thanks for your hard work and please let us Interface teams know when your features are done! Without features we can't really do any real work on our interfaces.

noorbuchi commented 4 years ago

@johnsc1 @lussierc @liux2 I will post a comprehensive update on the latest changes on the data_collection file. I haven't had the chance to make changes just yet but more updates will be coming soon and I'll put them on this issue tracker.

mcnultycollin commented 4 years ago

@MaddyKapfhammer hope to communicate with you today about the specific features of the Team Evaluation so that we can create a table for the interface team here.

lussierc commented 4 years ago

From #45, just to update you all.

Main Tasks left to complete for Web Interface:

Have working graphs (most are complete and working, but with sample data since we don't have features yet)
Get graphs to work with data from features (waiting on feature teams to complete these features and merge them into master so we can do this)
- Will likely involve changing our graph data frames since we don't really know what exact data feature teams are calculating/acquiring
Simple test cases for the web interface
- Basically we can just test if the pandas data frame being used for the graph populates properly with the feature data

koscinskic commented 4 years ago

There is now a function for retrieve_token that takes in a file path and returns the user's token, provided it is stored locally in token.txt. Otherwise, if no input parameter is given or the file does not exist, it returns the Travis token. The Travis token is purely for testing and will not be able to mine the repo.

koscinskic commented 4 years ago

There are still some minor issues with the test cases, which is why I introduced the method in the first place. It's functionally a check/pass-through, but necessary for stable testing.

noorbuchi commented 4 years ago

Update from the latest PR in individual metrics lines, we are trying to get it merged. This table shows the function descriptions.	Function Name	Input Parameters
collect_commits_hash(repo_path)	Path to repository, String	List of dictionaries of commit info
get_commit_average(lines, commits)	Lines and commits are int type	lines divided by commits, handles for division by zero
parse_for_type(name)	name of the file is a string	splits text and returns file format as string
get_file_formats(files)	list of strings as files names	list of strings of unique files types/formats
collect_and_add_raw_data_to_json(path_to_repo, json_file_name="raw_data_storage", data_path="./data/", overwrite=True)	Shortcut method that collects raw data using pydriller, then writes to json	No return, writes data to .json file
collect_and_add_individual_metrics_to_json(read_file="raw_data_storage",write_file="individual_metrics_storage",data_path="./data/",overwrite=True,)	This function skips calculation steps, do not use unless that's intended	No return, writes data to .json file
calculate_individual_metrics(json_file_name="raw_data_storage", data_path="./data/")	no parameters are necessary of using default files to read data	Nested dictionary of individual metrics
print_individual_in_table(file_name="individual_metrics_storage", data_dict={}, headings=["EMAIL", "COMMITS", "ADDED", "REMOVED"])	Prints either from dictionary or file, takes a list of headings as dictionary keys	no output, simply prints the data
merge_metric_and_issue_dicts(metrics_dict, issues_dict)	merge dictionary with Pydriller data with dictionary with Pygithub data	return a merged dictionary
merge_duplicate_usernames(dictionary, kept_entry, removed_entry)	entries are data set keys intended to be merged, they are strings	return a dictionary with merged entries

MaddyKapfhammer commented 4 years ago

The following functions are for the data_processor.py file specifically the TEAM EVALUATION portion NOTE: This has not yet been merged to master

Function Name	Input Parameters	Returns
iterate_nested_dictionary(dictionary)	Nested dictionary given by `add_new_metrics` function	A new dictionary with a metric ("COMMITS", "ADDED" etc.) as the key and a list of those values as the value
calculate_iqr_score(data_list, below_weight, above_weight, within_weight)	A list of datapoints, 3 int values for calculations	A calculated iqr score for the specific list of datapoints (percentage)
calculate_team_score(dictionary, below_weight, above_weight, within_weight)	Nested dictionary given by `add_new_metrics` function, 3 int values for calculations	A calculated team score found by adding together all scores for each metrics category, and dividing by the amount of categories (an average score) (percentage)

The function calculate_team_score() uses the other two functions, so it is the only one that needs to be called to return a value (which is the team score). If you want to display the scores for each category, these can also be accessed with calculate_team_score()

cklima616 commented 4 years ago

@MaddyKapfhammer @bagashvilit When I spoke to Teona yesterday, we talked about a print method being available to put output in the terminal. I know you guys had a working function - is that something CLI team would call, or should we implement it in cogitate.py?

bagashvilit commented 4 years ago

@cklima616 I'll explain what my functions do in a moment

bagashvilit commented 4 years ago

As for the print function, I did not create it I just use pandas dataframe, you guys can import that in your file and use for the display.

MaddyKapfhammer commented 4 years ago

@cklima616 I have not created a print method for the team evaluation functions.

cklima616 commented 4 years ago

@MaddyKapfhammer One other question - do you need CLI to call the calculate_iqr_score method if the user provides below/above/within weights? Or are those just going to remain hard coded/calculated some other way?

MaddyKapfhammer commented 4 years ago

@cklima616 The CLI only needs to call the calculate_team_score method. The user does need to provide below/above/within weights for this function, as the customer said that it would be better to specify that information, than have it be hard coded.

noorbuchi commented 4 years ago

@cklima616 If you need to print a nested dictionary, I suggest using the function in the data_collection module that our team wrote, you will simply need to follow the parameters outlined in the function and you can get a table with all the information you need. Please let me know if you have questions on how to use it.

bagashvilit commented 4 years ago

The following table has functions from `data_processor.py` file.	function	parameters	return
add_new_metrics(dictionary)	parameter is dictionary you get from data_collection functions	returns an updated dictionary with new metrics such as `TOTAL`, `MODIFIED` `RATIO`
individual_contribution(dictionary)	parameter is returned dictionary from `add_new_metrics` function	returns nested dictionary where keys are the `username` and `metrics` and values are percentage of individual contribution

When I finished working on my program to demonstrate the results I used pandas dataframe. All you need to do is import pandas as pd and print(pd.DataFrame.from_dict(dictionary).T), the parameter here is dictionary the dictionary that needs to be printed out. This is one of the ways to represent data and does not have to be this way.

lussierc commented 4 years ago

I refactored the Building And Testing Team's first two features in #71. The refactored code appears in the function, calculate_individual_metrics(json_file_name, data_path).

JMilamber commented 4 years ago

@bagashvilit does the "add_new_metrics' function need to be called for both team based functions and individual based functions or just before individual based functions? Also are there default values that we can use for Above_weight, Below_weight, Within_weight? or is that a question for prof.?

bagashvilit commented 4 years ago

@JMilamber Yes because both individual and overall functions need to use updated dictionary. How I would recommend doing is to call add_new_metrics function and once you get an updated dictionary, use that for both individual and overall functions. Please let me know if you have any further questions.

bagashvilit commented 4 years ago

@MaddyKapfhammer Could you comment on default values for weight

MaddyKapfhammer commented 4 years ago

@JMilamber the default values that can be used for the weights are as follows:

above_weight : 0.2
below_weight: 0.2
within_weight: 0.6

you could also use:

above_weight: 0.25
below_weight: 0.25
within_weight: 0.5 if you find these values to be more fair

JMilamber commented 4 years ago

@MaddyKapfhammer Okay thank you. Those will be added.

JMilamber commented 4 years ago

@bagashvilit okay thank you for the update, i will move the call so both individual and team get the updated dictionary.

koscinskic commented 4 years ago

I'm assuming you all have figured this out already, but I'm fairly certain initialize_contributor_data() is already done by one of the functions covered by the individual metrics team, so it should be safe to remove.

cklima616 commented 4 years ago

@bagashvilit @MaddyKapfhammer I am receiving this error when running our program:

To see the output in the web, simply add '-w yes' to your command line arguments.
Traceback (most recent call last):
  File "src/cogitate.py", line 164, in <module>
    main(args)
  File "src/cogitate.py", line 43, in main
    new = team(dict)
  File "src/cogitate.py", line 125, in team
    updated = data_processor.add_new_metrics(new_dict)
  File "/Users/sitstatic/cs203S2020/cogitate_tool/src/data_processor.py", line 133, in add_new_metrics
    for key in dictionary:
TypeError: 'int' object is not iterable

cklima616 commented 4 years ago

@bagashvilit @MaddyKapfhammer Upon reviewing this error, I realize it's because of how I called the add_new_metric function. For calculate_team_score, the function returns an int and I tried to pass it through add_new_metric, which requires a dictionary. I am going to continue working this out, but if there's something I'm missing let me know!

cklima616 commented 4 years ago

@MaddyKapfhammer Since you worked on the team function - does it only return one single score? I was under the impression it would be some sort of score by branch/small group. If it's just one single baseline score, that's fine, I'll just have to change the code.

bagashvilit commented 4 years ago

@cklima616 It does return one single team score. Only the individual evaluation returns dictionary.

cklima616 commented 4 years ago

@bagashvilit Thank you! After fixing this, here is the current output of our program:

To see the output in the web, simply add '-w yes' to your command line arguments.
Empty DataFrame
Columns: []
Index: []
0

bagashvilit commented 4 years ago

@cklima616 I'll take a look at your branch as soon as I get the chance

bagashvilit commented 4 years ago

https://github.com/GatorCogitate/cogitate_tool/blob/a9d23c00194a1e53a81fa5a9304def685a312a60/src/cogitate.py#L155 Here instead of this it should be:

updated = data_processor.add_new_metrics(dict)
new_dict = data_processor.individual_contribution(updated)

bagashvilit commented 4 years ago

@cklima616 You should also double-check with data_collection team to make sure that you are getting data correctly

cklima616 commented 4 years ago

@noorbuchi I have implemented all functions as described, but am still getting an empty output.

To see the output in the web, simply add '-w yes' to your command line arguments.
+----------+-------+---------+-------+---------+
| Username | EMAIL | COMMITS | ADDED | REMOVED |
+----------+-------+---------+-------+---------+
+----------+-------+---------+-------+---------+
Team Score:
0

If you have a chance, could you ensure I have implemented methods for data collection correctly?

noorbuchi commented 4 years ago

@cklima616 I will do that soon, are these files in the cogitate.py file?

cklima616 commented 4 years ago

@noorbuchi Yes! Primarily in the main method, lines 34-40.

noorbuchi commented 4 years ago

  data_collection.collect_and_add_raw_data_to_json(
            args["link"], "raw_data_storage.json"
        )
        # allows the user to enter the merge while loop if they specified to
        data_collection.collect_and_add_individual_metrics_to_json()
        # calculate metrics to be used for team evaluation
        individual_metrics_dict = data_collection.calculate_individual_metrics()
        if args["metric"] == "team":
            team(individual_metrics_dict, args["below"], args["above"], args["within"])
        elif args["metric"] == "individual":
            individual(individual_metrics_dict)
        elif args["metric"] == "both":
            new_individual_metrics_dict = individual(individual_metrics_dict)
            team(
                new_individual_metrics_dict,
                args["below"],
                args["above"],
                args["within"],
            )

I will be specifically referring to the code mentioned above. It is from lines 34-52. Your call to collect_and_add_raw_data_to_json is correct, however, you do not have to specify the name of the file if you're using the default one. On the other hand, collect_and_add_individual_metrics_to_json should not be called. This is a shortcut method that writes the individual metrics to the json file without adding any calculated data or any issue data so it skips some steps. Instead, use the calculate_individual_metrics functions just like you have done on line 40. This function would create a dictionary by reading from the default json files unless otherwise specified. After getting the calculated metrics dictionary, you should get the PyGithub data and use the function merge_metric_and_issue_dicts to get a dictionary that contains all of the uncalculated information. Then you should prompt the user to merge duplicate usernames while printing the table. To do that, you can make a while true loop that exits at a specific condition. Once all the metrics are ready, you can send the dictionary to add_new_metrics in the data_processor module. This would add calculated metrics to the dictionary. Once you get the dictionary, even with skipping some steps, you can print it out in two ways. Either by sending the dictionary directly as a parameter like this: print_individual_in_table(data_dict=your_dictionary, headings=["EMAIL", "COMMITS", "ADDED", "REMOVED"]) this is just a list of headings that you can use, if you prefer, you can add/remove headings. Remember, headings have to be the same as keys from the dictionary. The other way of printing the dictionary is by print_individual_in_table( headings=["EMAIL", "COMMITS", "ADDED", "REMOVED"]) this will take individual_metrics_storage as a parameter by default and read from that json file. Make sure that the dictionary is written to the json file before using the latter. You can do that through the write_dict_to_json_file in the json_handler module. I hope this was helpful, please let me know if there are any additional questions.

JMilamber commented 4 years ago

@noorbuchi is the PyGitHub data just the retrieveissue data? im working on this now

noorbuchi commented 4 years ago

@JMilamber yes, retrieve_issue_data would allow you to do that. The call would look like this retrieve_issue_data(repository, state, contributor_data) where repository is a repository object you can get from authenticate_repository, state can be a string of all open or closed, and contributor data is a dictionary, an empty one would work

JMilamber commented 4 years ago

okay sounds good. Thank you @noorbuchi ill let you know when its been updated in the branch

GatorCogitate / cogitate_tool

Feature Info for Interface Teams #64