Ensure stable sort for all views

jmcarp commented 8 years ago

As pointed out in https://github.com/18F/FEC/issues/142, some views (specifically, views paginated using offset and limit) can return duplicate results under some sort rules. This happens when we sort on a column that has ties, which, combined with offsets and limits, causes postgres to return results in non-deterministic order. That's bad, since users may see the same result more than once, and may never see other results.

Options:

Paginate all tables using seek / cursor pagination, which always sorts on a unique key (as well as other user-specified columns). This has the disadvantage that it doesn't let users select a page of results--the only pagination operations are previous and next page. We're currently only using seek pagination for larger tables.
Retain offset pagination for smaller tables, but always include a unique key in the sort rules. This has the disadvantage of presenting two different pagination interfaces to users, and of maintaining code and tests on the backend and frontend for both interfaces. It also leaves the ambiguity of which tables should use offset pagination and which should use seek pagination.

For simplicity, I would lean toward using seek pagination for everything, but I'm curious to hear what others think: @arowla @noahmanger @LindsayYoung. Both options would be quick to implement, and I think we should get to this soon if possible.

noahmanger commented 8 years ago

My gut says that we should be ok with the first option. I think when you're dealing with large lists of results, the best way to find what you're looking for is to use the filters, not to click to random pages.

But I'm doing a little investigation to see if there's any usability research that addresses this.

LindsayYoung commented 8 years ago

I would lean toward adding a unique key to sort. We already have 2 kinds of pagination, I don't think anyone has complained about that. I think that using page= is much easier to work with.

LindsayYoung commented 8 years ago

Also, if we change all the pagination for most of our endpoints, I think that change should be versioned, because it would break most people's things.

jmcarp commented 8 years ago

In general, paginating by offset is a little easier to understand, but performs (potentially much) worse than paginating by last index. There are currently a few tables using offset pagination that really shouldn't be--some of the aggregates (occupation, employer) have millions of rows. But I'm not (strongly) opposed to keeping offset pagination for the smaller tables.

We should probably talk about versioning soon, since we intentionally haven't made strong promises around it and haven't started on implementation. I heard a rumor that @konklone might not think versioning is necessary--would be interested to hear more about that. That aside, I don't think this bug is the place to implement versioning.

konklone commented 8 years ago

I heard a rumor that @konklone might not think versioning is necessary--would be interested to hear more about that.

=) The relevant discussion is here: https://github.com/18F/api-standards/issues/5

However, I wasn't opposed to versioning in and of itself. I was arguing that versioning is premature optimization for v1 of an API, and that it's a decision better made at v2.

jmcarp commented 8 years ago

Definitely agree that it's possible to version prematurely. My main uncertainty is how we know when we get to v2--when we need to commit to version any breaking changes. If we change pagination parameters, does that push us to v2?

LindsayYoung commented 8 years ago

If I had built something, this would annoy me-- maybe I am too sensitive. I have been asking people to try the data and I think this API will be in good shape for the 2016 election.

Having API users is helpful to find bugs like these. I don't think we will have API users if we are not careful about introducing breaking changes. And again, maybe they are more tolerant than I think.

I would be ok, not versioning if we could give a warning. Could we use unique key in the sort rules for now, and then post something on our documentation about the coming change and email the google group? I know no one is on that group yet, but that would make a good precedent for people to join.

arowla commented 8 years ago

I tend to agree with @jmcarp that it is preferable to have the same style of pagination (seek / cursor) everywhere throughout the API for consistency, and those who wish to find more specific things should just be more specific about their filters, like @noahmanger said. But I do think, with @LindsayYoung, that it is important at this point to not suddenly break things for API users. It would be good to give them some warning, at least.

That said, I find myself wondering whether there's any way to achieve a hybrid model for the pagination parameters, or to simply start using cursor pagination, while warning the user that they are not getting the pagination results they may have expected if they are entering random page= numbers.

What would the new pagination parameters look like?

noahmanger commented 8 years ago

Just discussed this in the meeting: let's try to get this done in this sprint if possible.

LindsayYoung commented 8 years ago

I am going to break this into two issues the symptom described in 18F/FEC#142 and unifying pagination.

fecgov / openFEC

Ensure stable sort for all views #1361