Metric: Collaborator familiarity

andymeneely commented 10 years ago

Consider this scenario:

As of this code review we have Danielle, Chris, and Brian. I have worked with:

Chris 25 times
Brian 2 times
Danielle 0 times

Ideas:

Total Familiarity: sum it up (27)
Average Familiarity: average (9)
Familiarity: maximum numbers of prior code reviews (25)
Objectivity: minimum number of prior code reviews (0)

Need to get an idea of the spread: maximum entropy? std deviation? Max-Min?

Future work question: do we take into account the nature of those collaborations? LGTMs? Owners? Contributors? Participants?

Make this into a method of CodeReview. Aggregate it in rake run:stats.

smt9020 commented 10 years ago

Just trying to gather my thoughts on this issue:

This method would go in code review...would it end up being multiple methods, one for each stat? (total, average, max per person, etc)
This would have to be based on the date of the review. So when I was trying to look up the experience of each developer with one another, I would need to take into account the date of each review and make sure it was before the one that I am currently looking at
I am confused by the wording "I have worked with"...is that implying that this review has 4 people? it wouldnt make sense to do this method in code review and base it from one persons point of view. Lets say there are only 3 people on the review. Brian has worked with Chris 15 times, and Danielle has worked with Chris 10 times, making his number 25...but then wouldn't that be in reverse also? Then Brian would have 15 as well.........I am a little confused about that haha

andymeneely commented 10 years ago

Yes, all different methods.

Yes, dates are definitely a part of this. Use the "created" field in CodeReview for both sides of that query. For example, if Bob did a code review with Jane only in April, May, and June of 2011. Then the April review they would have a count of 0 (they had never worked together), then the May review would be 1, then for the June review would be 2. That count would then go into min, max, total, avg.

For the third one, consider this example:

Chris initiates a code review and asks Danielle and Brian to join. He has prior experience with both of them.
Brian <--15--> Chris <--10--> Danielle
Total familiarity for this code review: 25
Max fam for this code review: 15
Min fam for this code review: 10
Avg fam for this code review: 12.5

Now, one thing I am not taking into account are the cross-working of the reviewers. I'm just centering this on Chris. Who knows how many times Danielle and Brian have worked together directly? Should we take this into account too? How?

Note that I'm defining "worked with" as "co-reviewed", by the way. We could do contributors and participants as well, but let's get these metrics nailed down first.

andymeneely commented 10 years ago

By the way, we also need to think about how we aggregate this over a file. I'd be interested to see how Familiarity and Objectivity increase and decrease over time. Clearly, familiarity should increase if people keep reviewing the same work. But, I'm interested in seeing how often new people are integrated as well.

smt9020 commented 10 years ago

Most of my confusion I think is stemming from the fact that it would be centered around Chris. How would we choose which developer to center it on? I purpose this alternative:

Prior Experience:
Brian <--15--> Chris <--10--> Danielle <--2--> Brian
Brian: 17
Chris:  25
Danielle:  12
Total familiarity for this code review: 27
Max fam for this code review: 15
Min fam for this code review: 2
Avg fam for this code review: 9

Now lets add in a fourth person:

Brian <--15--> Chris 
Brian <--2--> Danielle
Brian <--5--> Shannon
Chris <--10--> Danielle 
Chris <--0--> Shannon
Danielle <--7--> Shannon
Brian: 22
Chris: 25
Danielle: 19
Shannon: 12
Total Familiarity: 39
Max: 15
Min: 0
Avg: 9.75

These examples both make sense to me. Does this seem like a good approach?

smt9020 commented 10 years ago

finding the familiarity numbers based on the reviewers relationship with the author, not reviewer to reviewer.
familiarity versus objectivity (objectivity is the persons lack of social familiarity, the lowers amount of times working together)
each code review has a familiarity and an objectivity rating...one number, we ignore all of the middle numbers
add owner to code review so that the relationship can found between the owner of that review and the reviewers
is averaging bad because it homogenizes the numbers? Or is that kind of how social interactions work, where people's behaviors change depending on who they are around.
Familiarity Gap: the difference between the highest and the lowest familiarity relationship

andymeneely commented 10 years ago

Ok, so I'm thinking that implementing this is going to result in some slow queries. The current total_familiarity method takes over 60 seconds to complete for a single code review. No way we can use that in person for analysis. So let's do this:

Add a field to the Participants table called reviews_with_owner. This field represents the number of prior reviews that a participant has had with the owner of the code review. Populate this field as a part of a Consolidator so that we can take advantage of our indexes.

That way we can just compute the metric by doing sum/max/avg/whatever of already-queried numbers.

We need a verify for this, don't forget!!

I'm thinking of doing something similar for #113 as well.

smt9020 commented 10 years ago

so I wrote this out to fill the reviews_with_owner column, please review. Here is what the reviews_with_owner metric represents:

At the time of this review, participant has reviewed reviews_with_owner number of owner's review. NOTE: This means that if they both were reviewers on someone else's review, that doesn't count.

Is that what we were envisioning?

I'm a little unsure about how to do a verify for this because in the development data there is no overlap for this (all the numbers are zero). thoughts?

andymeneely commented 10 years ago

Yes. If you're non-participatory then you basically don't count in my view.

Code LGTM. Thanks!

As for the verify, let's add to the development data with some co-participants with a current review we have. I'll take a look at production and get back to you with some code review ids.

andymeneely commented 10 years ago

Ok, I think I have an example. For the token code review 10854242 (already in our data set), palmer@chromium.org had agl@chromium.org as participant. In a prior code review 9415040, palmer had agl as a participant.

So download 9415040 and add a verify based on that. You'll have to update the other counts in other verifies as well.

smt9020 commented 10 years ago

To-Dos (for me)

fill out the total familiarity method in code review based on the new column (which would just be adding up the number for each participant of the review)
the average it using the same numbers
put 9415040 into the development test data
write the verify for total and average

andymeneely / chromium-history

Metric: Collaborator familiarity #107

To-Dos (for me)