MeasureOSS / Measure

At its core Measure is, for lack of a better term, a contributor relationship management system. Measure consists of easy to understand widgets that can be arbitrarily displayed to build dashboards. It allows you to understand how people as individuals and as organizations are interacting with open source projects on GitHub. It’s metrics that focus not only on code, but on contributors.
Apache License 2.0
159 stars 17 forks source link

Missing "nice" error message NO_REPOS_CONFIGURED #102

Closed danisyellis closed 6 years ago

danisyellis commented 6 years ago

I attempted to run makedash.js without adding any GitHub repositories to the config.yaml. Because this isn't allowed, I should have gotten an error message explaining what I need to change and why. Instead, I got this output in my terminal:

Internal errors are by definition a bug in Measure, and should be reported.
The error was detected at line 170:39 of 'lib/loads.js'.
This should never happen, and I am halting in abject failure.

I would be willing to write the nice error, to fix this bug. However, I like the way that many of the error messages explain why the user's action isn't allowed, and I'm not sure why a repo is required to create the dashboard.

(My use case (why I'd prefer to make a dashboard without configuring a repo, for now): I'm planning to write a couple of widgets that track a user's contributions across all of GitHub so that we can track all contributions by our employees to open source projects. Eventually, we'll also care about our company's organization and repos, but that's not our first priority. Given that, I'd be happy to make a dashboard with the users that I've crawled, without configuring a repo, if that wasn't a necessary part of the code. However, if it would be difficult to change the Measure code to accommodate that, I can configure a repo and just ignore it for now.)

stuartlangridge commented 6 years ago

Sadly, we can't track a user's contributions across all of Github. We are constrained by the Github API to have to fetch contributions by repository; we can't fetch contributions by user. So we fetch everything for the repositories you have listed, and then analyse the data we've got; we can therefore only show things that happened in your github_repositories list, not everything that a given user has done regardless of repo. That nice error was never written, which is in itself a bug (thank you for that), but now that I've written it I've taken the opportunity to explain this in the nice error itself.

(The fix for this would need to be done in ghcrawler upstream, as you've seen in https://github.com/Microsoft/ghcrawler/issues/146 -- if that happens, then we could alter Measure to be able to track contributions for a user and not within specific repositories, but we'd need to do work to make that happen; it wouldn't work automatically if ghcrawler started supporting it.)

danisyellis commented 6 years ago

@stuartlangridge Thanks so much for the quick response and getting that bug fixed!

I've been working on the ghcrawler code and I am currently able to queue up a user and retrieve all of their events (on my local- it's a WIP so I haven't pushed the code up to ghcrawler yet). The API supports this (Check out https://api.github.com/users/stuartlangridge/events). Unfortunately, it doesn't support webhooks for users/events like it does for repos and orgs, but I can just run the crawler manually on a regular interval. So, I am able to get contributions across GitHub for a given user, unrelated to a repo. (Also, even without my new code, you can queue up a user via ghcrawler's browser dashboard, it just won't crawl that user's events without my code change.)

Once I have that user info, if I run Measure's makedash.js, that new user does get populated into dashboard/contributor/newUser.html

So my plan is to now write some new widgets that grab and filter the user event information in the mongodb to give us the user contribution info we want. Is that what you meant when you said 'if that happens, then we could alter Measure to be able to track contributions for a user and not within specific repositories, but we'd need to do work to make that happen; it wouldn't work automatically if ghcrawler started supporting it.' or is there also other work that needs to happen? Thanks for your help!

stuartlangridge commented 6 years ago

To explain: Measure will make user pages for all the users it finds in the database, howsoever they got there. So if you have some other way of getting user data into MongoDB (for example, as you note, by poking ghcrawler to fetch from the Github user data API) then Measure will happily create contributor pages for all those users. However, this won't be terribly well integrated into the rest of Measure, because there's no way to add a user list to config.yaml, and to provide that way we'd need to write that code to read the new userlist and prompt for how to run ghcrawler to use it, and depend on a version of ghcrawler which understands it.

There's also the concern that Measure assumes that its data is consistent; that is, that user data got there because it was fetched as a consequence of fetching repository data. So, for example, the "Repositories contributed to" widget on a contributor page is populated by reading repository data filtered by user, not user data itself; this means that this widget will not contain data for repositories which the user did contribute to but which we don't have data for because they aren't listed in github_repositories in config.yaml.

In summary, therefore, it's perfectly reasonable to get extra user data into MongoDB, and to write your own widgets to read that data and display it as part of a Measure dashboard; the whole system is designed exactly so that you can write your own custom widgets to display the data you've got in the ways that are most useful to do! But those widgets are not likely to get into the upstream distribution without a bunch of work in the dependencies and thinking through the consequences. Of course, they don't have to; widgets that you find useful are useful to you and that's sufficient, and there's no need to contribute them back upstream if they won't work for others. I hope that explains our thoughts in more detail?

danisyellis commented 6 years ago

That does explain your thoughts well, thank you!

Ideally, we would love to contribute our widgets to Measure so that other Open Source programs can use them to track their employee contributions (with the larger goal of getting more companies involved in giving back to the OS projects they use). And I see your hesitations based on the current limitations of the code.

I agree that there would be a lot of work for these users that are not crawled through repos to use any of the widgets that currently exist. But, I'd already been thinking about where these users would live in the database and where their widgets would be in the Measure codebase. Based on what you said, I'm thinking that a possible solution would be for any user crawled as a User (not through a repo or org) to go into its own mongoDB collection, let's call it repolessEmployee for the moment. And then, when the Measure dashboard is made, instead of being in contributor, they could go into a new folder called repolessEmployee that would make html files containing the widgets we're interested in for these users. (And an index page that would contain totals for all employees added together, and links to the individuals)

Assuming that the changes are made to ghcrawler to allow this setup, (including, potentially, queueing an array of GitHub IDs) is that a contribution that you would be happy to merge in? The main issue I see is that some people would be in both contributor and repolessEmployee, which could be confusing. It also creates a sort of divide as we're dealing with 2 almost entirely different datasets, but with good documentation and maybe a small change to Measure's home page, I think we could explain it and make Measure a tool for OS programs that care about more types of contributions.

danisyellis commented 5 years ago

Hi @stuartlangridge I just wanted to check in because I haven't gotten a response to my last question and thought you might have missed it. Let me know if there's anything you'd like more clarity on.

stuartlangridge commented 5 years ago

OK, we've had a pretty long discussion about this. What you want is a good thing to want, but it's not what Measure does, and we think that making Measure do this will make Measure as a whole more confusing. It's very much oriented around tracking changes that happen to repos you care about, by whichever people do them; what you're looking for is tracking things done by people you care about, in whichever repos they do it. A system which did what you want would look quite a lot like Measure, and would share a lot of code with it, but we can't see a good way to make Measure do both types of thing without it being very confusing; everything would need to be phrased as "if you care about the community around your project, do it this way, but if you care about your people's work everywhere, do this other thing", which blunts Measure's overall strategy of "have awareness of what's happening to your projects".

So, we don't think we'd want to implement this.

However, as noted, a system which did implement this would look quite a lot like Measure, and if you were to want to build such a thing, we'd be happy to give some thoughts on how a fork of Measure might work to do it, if that's helpful. If you think we're missing something and there is a way to add this functionality without complicating what Measure is, we're happy to discuss; what you proposed around repolessEmployee is fine technically, but the complication and confusion to describing what Measure is is what concerns us.