Make API output civic.json files for projects

jpvelez commented 10 years ago

The first stab at answering the "how do we document civic tech projects" was to define a civic.json file. Repo developers could fill out all the fields manually, and drop them into their projects. We could then amass a list of projects and their details just by searching Github.

This approach made too much work for developers. Instead, we want robots to do the work. Civic.json is still useful, attractive, and simple meta-data standard to describe projects, though.

So civic-json-worker's job is to automate the creation - as much as possible - of these files from Github. It should dump out civic.json, and hopefully this meta-data standard proves useful in other projects.

Civic.json encompasses all the fields the API is currently getting from Github, please a proposed set of extended fields. Over in the civic.json repo, we're figuring out what small set of extended fields we should support, and how to make it as easy as possible to collect these fields. (Discussion going on in the issue tracker.)

Once that's clarified, the API would be modified to support the full spec. That's the idea, anyway.

migurski commented 10 years ago

Thanks JP, this is great information.

I’m a little uncomfortable with the bit about getting all of Github’s fields. Since we want civic.json to be a spec (do we?), I think we should be a bit more judicious in how we connect it to Github’s own product features. There are a lot of concepts in the Github API that feel exclusive to the Git/Github ecosystem, and potentially meaningless when civic.json is used with non-Github sources. For example, watchers_count, open_issues, language, html_url, forks_count, and contributors.login all describe current characteristics of Github that may not be applicable to other products like Google Code, Bitbucket, or even Github +2 years from now.

I’d like to suggest that we prune civic.json to only those things we feel comfortable supporting, decide whether we want to name them differently, and then specify the mapping back to Github’s fields explicitly.

noneck commented 10 years ago

+1.

I'd like to see the spec confirm to Schema.org Software Application [1] fields that are already implemented in NuApps and potential in other hackathon/app store related sites.

If you want this to be the standard, y'all should be thinking about:

Hacker League
Code Montague < https://www.codemontage.com >
NuApps < http://nucivic.com/products/nuapps/ >

[1] http://schema.org/SoftwareApplication

In general, the civic.json file should be as simple and uniform as possible.

jpvelez commented 10 years ago

Excellent point!

So here's an interesting puzzle. Right now, we're using a lot of these github-specific fields on the open gov hack night project page (http://opengovhacknight.org/projects.html). They've proved super useful for this use case - make groking and interacting with projects really simple.

And I'm sure the fields will be pretty useful for a lot of other projects - how active/inactive to projects tend to be? what's the network structure of civic tech contributors on github (who commits with whom)? how are projects connected? and so on.

So even if we don't bake all of the Github fields into the civic.json files that worker will serve, it would still be very useful to capture have worker capture these.

What's an elegant way of doing that? Here's on way: perhaps you have a core set of Github fields (project name, contributor) and then a hash of Github / Bitbucket / Google code specific fields that are optional and free to change over time. This hash might be an official part of the civic.json definition, or just a nice freebie that civic-json-worker gives you. I'm sure there are other approaches.

jpvelez commented 10 years ago

+1 to @noneck. Noel, can you start an issue about making civic.json fields conform to schema.org in the civic.json issue tracker (https://github.com/BetaNYC/civic.json/issues)? Spec specific discussion is happening there.

migurski commented 10 years ago

There are schema.org nods to this, like interactionCount. I’m not an expert on schema.org stuff, so here’s a dumb question: would we have to use it all, or can we cherry pick just the fields and data we feel is applicable to civic projects?

migurski commented 10 years ago

Other relevant schema.org things: installUrl for repo URL and discussionUrl for issues. Reading schema.org already makes me feel like I accidentally got off at the wrong floor, perhaps we simply ignore it.

migurski commented 10 years ago

+1 to hash idea @jpvelez, by the way. To veer into implementation, Postgres hstore gives us an easy place to drop a hash of externally-defined data while sticking to a core schema of internal fields.

migurski commented 10 years ago

Stupid “close & comment” button.

ondrae commented 10 years ago

Currently the API schema is:

{
    num_results: 493,
    objects: [
        {
            categories: "community engagement, housing",
            code_url: "https://github.com/codeforamerica/cityvoice",
            description: "A redeployment of CityVoice in South Bend, Indiana.",
            github_details: "{...}",
            link_url: "http://www.southbendvoices.com/",
            name: "South Bend Voices",
            type: "service"
        },
        ...
    ],
    page: 1,
    total_pages: 1
}

The github_details attribute contains all the important parts that make the Chicago Projects page work. We could add another attribute called civic_json that contains a hash of the project data in whatever form the civic.json spec takes.

ondrae commented 10 years ago

We're going to keep the above attributes for now as we keep building out the other end points for the API. There is a great discussion happening at https://github.com/BetaNYC/civic.json/issues/6 about what the community wants for this /projects endpoint.

We're trying to push this API live by the end of March, so whatever schema is agreed upon by then will be the first version.

pmackay commented 10 years ago

Would it be worth picking this up again along with #118, perhaps as a v2 version definition of the API?

codeforamerica / cfapi

Make API output civic.json files for projects #13