18F / analytics.usa.gov

The US federal government's web traffic.
https://analytics.usa.gov
Other
729 stars 194 forks source link

incorporating API hits #400

Closed danhammer closed 6 years ago

danhammer commented 8 years ago

There were 2.2 million hits to the Astronomy Picture of the Day (APOD) web pages in August 2014. There were less than 700K hits to the APOD web pages in August 2016. The APOD sites no longer appear in the top ten web pages, even though the content is arguably more popular than ever. The reason is an increase in API usage, and derivative third-party applications. Last month there were over 4.5 million hits to api.nasa.gov.

This example illustrates that web views may be an incomplete metric for the consumption of online, public content. It is clearly the best single metric, but still not complete. Page views may be supplemented by top API usage statistics.

In previous conversations, @konklone raised a few good concerns around manipulation or usefulness of API hits alone. From these conversations, also with @rypan and @GUI, it may be worth including a Top API panel, ranked by number of distinct users and then hits. Raising this as a possible enhancement.

danhammer commented 8 years ago

I started to scope this out. If you're signed in, you can see this request to get users for each service.

{
    "draw": 1,
    "recordsTotal": 943,
    "recordsFiltered": 943,
    "data": ["..."]
}

Or, even without cookies, you can access history with the DEMO_KEY.

{
    "c": [
        {
            "v": 1473652800000,
            "f": "Mon, Sep 12, 2016"
        },
        {
            "v": 62727,
            "f": "62,727"
        },
        {
            "v": 60300,
            "f": "60,300"
        },
        {
            "v": 415,
            "f": "415"
        }
    ]
}

The idea, here, is that incorporating the number of users and the number of hits is already built into api.data.gov -- which is awesome. How can we move this forward? Who else needs to approve?

tdlowden commented 8 years ago

Hey @danhammer, this is a super interesting idea. That said, what makes the web stats info easy to use for a project like analytics.usa.gov is the fact that it is all stored in one convenient place for us to pull from. api.data.gov could certainly, with adoption, become a place with as much centralized participation in the API space, and then I think we could totally work on something of this sort. But as of right now, API data is scattered in many places. Implementation of API stats would be roughshod and require manual work for each API. Until we have a central, canonical source for federal API hit data, I think this kind of enhancement is beyond our grasp.

rypan commented 8 years ago

@danhammer Is there a way to push the number of users and number of hits data into Google Analytics? That's the "convenient place" @tdlowden is referring to.

tdlowden commented 8 years ago

@rypan this is probably possible, (not 100% sure) using custom events. That said, a custom event classifies as a "hit" and hits = $$. Based on the sheer number of API calls from api.nasa.gov alone, i'd anticipate this kind of solution (with many APIs pushing data to DAP GA) would push us into a tier costing more $$ than DAP can afford.

tdlowden commented 8 years ago

Currently, we are in the 1 to 5 billion hit tier with GA. Above 5 and we encounter next tier pricing, plus more data lag.

But regardless, I feel there would need to be a lot of time and resources devoted to buy-in on this from across government before we could do it. I don't think we'd want to do something like this without a significant swath of gov APIs participating, and for what it's worth, that took DAP 3 years with a "mandate" in the DGS.

konklone commented 8 years ago

FWIW, I think @rypan was referring to regularly sending the number of hits/users per-API into GA, not sending each hit or user registration into GA as a custom hit or event.

In any case, to sum up some of my concerns from the thread @danhammer mentioned, I don't believe it's useful or healthy to rank APIs by number of hits or number of users. Number of users is better than hits, but both potentially significantly understate impact, and distort the very concept of API value.

While this can be true with website visits, the disconnect is not as severe as it is for APIs. For example, if Congress.gov hypothetically offered an API, and the only users were GovTrack.us and the New York Times, that API would have 2 users. But those 2 users would be using the API to power applications with many millions of users.

This was my experience at the Sunlight Foundation, operating APIs that were used by a small number of large-scale campaigns and services. And those kinds of relationships are ones we want to encourage in the federal government. We don't want to incentivize agencies to spend their resources focusing on driving up hits or users. We want them to develop relationships that maximize the overall utility and impact of their data.

Again, while this dynamic is also true to some extent for websites -- a page that's only visited by 30 people is potentially of great impact if one of those 30 is a reporter who writes up something about it -- it's just not, IMO, nearly as severe.

Given how relatively few APIs there are in the government, it would be useful to aggregate them somewhere, and perhaps to show their key users along with some stats. Maybe DAP is a good place for that, maybe it isn't. But wherever it is, I would caution against ranking them, and instead push agencies to show where the value in their API lies, regardless of how qualitatively it needs to be captured.

tdlowden commented 8 years ago

ah, apologies for my misunderstanding, @rypan.

rypan commented 8 years ago

@konklone is giving me more credit than I deserve 😄

@tdlowden - I didn't know the extra events would push us to a new tier. I assumed Google Analytics does their aggregating/sampling magic and didn't care about how many events you pushed. (Their pricing isn't too transparent!)

On the theme of usefulness...it'll be helpful to go from Page/Views to Transactions. So there is more visibility into actions like, Number of Visa Applications Submitted to Number of Appeals Filed.

Then within those transactions, better understanding Completion Rates and Satisfaction.

danhammer commented 8 years ago

I understand and agree with most of @konklone's points. What is the best way to track use of federal APIs? What are the appropriate dimensions and metrics of meaningful use? I have no good answer. Right now, the available dimensions are number of users and number of hits. Given the importance of web service APIs in the Administration's digital strategy, it seems like some attempt to monitor and evaluate usage -- however imperfect -- is a worthwhile exercise, if only to start the (inevitable) conversation.

It is hard for me to believe that developers would try to game the API numbers, especially since it is already so easy to boost numbers on federal websites. Literally 5 lines of code, even for sites that require perfunctory login. This does not seem like a valid argument against posting API usage publicly. If anything, it is an argument for making this information available for public review. There is no material incentive for boosting numbers, like there was at the Sunlight Foundation.

We already have access to API usage metrics through api.data.gov. No further resources would have to be expended. This is clearly not comprehensive, but it may offer an incentive to register APIs with api.data.gov. This is valuable. Federal APIs need not be built centrally, but there is value in coordination and registration -- especially if there is value to distributed developers to submit their APIs centrally (probably for SEO or discoverability). This is not part of Google Analytics. No additional tiers would have to be purchased.

I wonder if it would be worth mocking up a front-end or widget for review. I would need to get more access to the api.data.gov numbers, which Gray or Nick could grant. Just like analytics.usa.gov, this would just be an assembly of existing web services.

laurenancona commented 8 years ago

Adding this for reference only, from a tracking standpoint - there are certainly more discussions to be had around tracking this type of data, and whether or not it ought be stored with client side measurement data.

But for 'offline' data or other server side metrics, the Measurement Protocol can be leveraged, both at the hit level and for bulk uploads. There are details to consider, such as associating hits at the session level, which is why it isn't a great fit for API tracking (unless you wanted to associate with client-side tracking, etc). It's most often used in tandem with a userId view, but figured it's useful to be aware of.

@konklone's example from Sunlight raises a good point, and I also see why @danhammer would want to be able to account for additional consumption, especially of that sort of content. I'm interested in this conversation, and also what @rypan's referencing in terms of using GA goals to calculate task completion (conversions) - though I've no idea what goal limits are like on the current DAP pricing tier or how that might be divided among agencies.