appdotnet / api-spec

App.net API Documentation is on the web at https://developers.app.net. Source for these docs is in the new-docs branch here. Please use the issue tracker and submit pull requests! Help us build the real-time social service where users and developers come first, not advertisers.
https://developers.app.net
950 stars 99 forks source link

Double-post protection #86

Open cgiffard opened 12 years ago

cgiffard commented 12 years ago

I've accidentally double-posted a few times, where visual indication was not given that a prior post had succeeded.

Now you could argue that this is the responsibility of the app-implementor to prevent double-posts. But the way I see it, it's a data integrity issue, and as a result should be implemented at the API-layer.

(Forgive me if this is already in the API spec. I had a fairly rudimentary look up and down, as well as a scan through the open issues, but couldn't find anything.)

Twitter does this, but it does it in a really ugly way, returning a 403 response that my client can't use to determine whether the request was rejected due to a double-post - or something else entirely.

I suggest returning a dedicated response in the event that a post with the same text is made twice, within, say, five minutes. Alternately, if there's no valid use discovered (except when spamming) for posting the same message repeatedly, regardless of the time interval, perhaps just ensuring that the post currently being made doesn't have the same text as the last.

Something to think about. :)

abitgone commented 12 years ago

Twitter uses 403: Status is a duplicate which I'd say is pretty neat. Duplicate statuses, within a set period of time, are forbidden. The HTTP 403 response seems quite useful and appropriate in this case.

cgiffard commented 12 years ago

Ah, I didn't realise it gave a nice message these days. :)

Scratch my vitriolic comment about their API then. App.net should totally plagiarise it! :)

cgiffard commented 12 years ago

But what sort of time period is reasonable? 1 hour?

abitgone commented 12 years ago

Much less than that, I'd say. An accidental duplicate post would probably only happen a few seconds apart – a minute or two max. A 5 minute period would mean you probably meant to post a duplicate update – and if you want to do that in a reasonable manner, who are we to stop you? :o)

tgvashworth commented 12 years ago

I think this issue is more complicated than that. Let's say I'm using a chat client for app.net, and I send the same message twice in a row, for whatever reason. In 5 minutes that's quite possible. "Yes" for example. What would the policy there be?

I think it's up to the client that creates the post to prevent this.

abitgone commented 12 years ago

The exact same thing happens on twitter now. Jokingly, I wanted to reply with "no comment" to a friend twice within about 3 minutes. I was prevented from doing so. I had to change my reply ever so slightly in the end to get around it. I don't think it's a bad thing.

The problem with leaving it to the client to prevent this situation from happening is that the client may not know that the message has been posted in the first place. In fact, it could be a buggy situation that causes the client to need the server to prevent this from happening.

Here's what I mean. You send a status update of "Good Morning". The API receives your request and posts the update, but some network condition means you don't get the reply from the API saying that the update was successfully sent. Instead, your code times out and your client tells you "Sorry, I couldn't post your update. There might be a network problem or ADN might be down". (This happens today on twitter.)

Perhaps it's a lost packet, perhaps its really bad latency, perhaps it's an incorrectly configured firewall or proxy server. Who knows what it is, but it's not the client's fault and it's not the server's fault.

So, your client might offer to try and repost the same status for you shortly thereafter. Without the service-side logic in place to prevent a duplicate status being sent, you could potentially have the same problem happen again and again.

Having double-post protection in place on the service-side goes some way towards mitigating the risk of that scenario playing out on ADN.

ghost commented 12 years ago

Maybe under 2-5 minutes the API blocks but after that its on the app and user?

shawnhooper commented 12 years ago

@phuu You have a valid point, although I was thinking of it from a different angle. In general, this is a great idea for human-human interaction. What about other types of applications that may use the app.net platform to send communications that are not for human consumption? There could be a potential there for "duplicate post" in a short timeframe.

Also, how do we suggest handling this where the duplicate post is simply meta-data, with no actual post text?

abitgone commented 12 years ago

Presumably, meta-data posts would be exempt from this – applications that are using ADN to log something on a second-by-second or minute-by-minute basis would potentially be logging the same data every time.

Which, I suppose, will get some of you wondering why, if meta-posts were made exempt, real posts shouldn't also be.

cgiffard commented 12 years ago

5 minutes sounds reasonable to me.

Additionally, is there a reason why metadata posts should be exempt? With polling and status services, even with binary data, shouldn't the assumption for anything consuming that data be that the information in the most recent post is valid until replaced by a newer set of data?

Can anybody describe why one would need to post exactly the same (meta)data twice in a row, within five minutes?

abraham commented 12 years ago

There could potentially be a flag sent with the API request that would disable duplicate restrictions.

dch commented 12 years ago

The real constraint here is when you have intermittent or unreliable network connectivity, and the client is not receiving replies consistently. It also occurs when you introduce bugs into your client code & repeatedly try to post something that was already successful :-)

Either accept (it's the same, networks fail, let's get over it), or maybe something like a 304 (not changed, thanks already got it)?

I'd guess that's like a LRU cache of post content & username.

On 17 August 2012 03:27, Abraham Williams notifications@github.com wrote:

There could potentially be a flag sent with the API request that would disable duplicate restrictions.

— Reply to this email directly or view it on GitHubhttps://github.com/appdotnet/api-spec/issues/86#issuecomment-7804748.

abitgone commented 12 years ago

@cgiffard Accurate average calculation, for one. When you're calculating the average of a particular data set, if you're keen on making it as accurate as possible, you need to have an entire data set. What happens when there's a three minute gap in the data? Do you assume that nothing changed and so you fill in the three missing minutes with the same data you had four minutes ago? Or do you assume that the polling for data failed and something is wrong?

JoshBlake commented 12 years ago

Is there any reason why the posting API cannot be designed to be idempotent?

Figure out a field or combination of fields plus the text to be posted that would make each authored post unique and let the client continuously retry until it receives confirmation. (For example, the id of the previous post the user made plus the auth token.) If two attempts go through but a reply is missing, no problem. The API would not double post from the second attempt but return the success code again. No timeouts necessary and allows multiple posting of the same content if the user decides to write the same thing again the next time.

abitgone commented 12 years ago

@JoshBlake Doesn't that rather turn it into a silent failure under certain circumstances?

cgiffard commented 12 years ago

The API flag could address the edge-case presented by certain kinds of metadata. And IMO it's really the client app's responsibility to deal with network issues - of course, the API can make it a lot easier.

For a double post, it could potentially receive a 403 (or maybe 3XX redirect?) with a header specifying the URI of the existing post - which would adequately instruct the client on whether the original post was received properly or not.

A 3XX response seems appropriate if returning a post URI. On the other hand, a 403 response would be in line with what you'd expect from a flood-control related error, which is kind of similar in vein.

JoshBlake commented 12 years ago

@abitgone Under which circumstances? Can you give an example? If a client is free to retry the same post as many times as desired until it receives the successful response, the only failure would be an improperly coded client user experience. (For example, developer decides five retries is enough so it silently fails verses notifying the user "cannot post right now, will retry automatically in 15 minutes" or "cannot post right now, try again later" or something)

abitgone commented 12 years ago

@JoshBlake Forgive me if I've misunderstood you, but let's suggest that I am logging the temperature of a room using ADN. My device sends out an API call with a room ID and the current temperature. Nothing is returned, so it tries again. Once again, nothing is returned so it tries again. On the third attempt, it gets a successful response.

A minute later, the temperature has remained stable, and so my device sends out another API call with the same room ID and the same current temperature. Presumably, the API will return a successful response, but won't insert the same data again?

Or have I misunderstood idempotent operations?

tonymillion commented 12 years ago

This can easily be fixed by including a "poster specific" (in this case device) generated UUID along with the post, and having it (optionally) sent along with the post details, assuming SHA1(text) && UDID are a match then the API could just return the previously generated post.

In the event the UUID DOES NOT match then it would create a new post even if the text was the same.

dch commented 12 years ago

On 21 August 2012 09:57, Anthony Williams notifications@github.com wrote:

@JoshBlake https://github.com/JoshBlake Forgive me if I've misunderstood you, but let's suggest that I am logging the temperature of a room using ADN. My device sends out an API call with a room ID and the current temperature. Nothing is returned, so it tries again. Once again, nothing is returned so it tries again. On the third attempt, it gets a successful response.

A minute later, the temperature has remained stable, and so my device sends out another API call with the same room ID and the same current temperature. Presumably, the API will return a successful response, but won't insert the same data again?

Or have I misunderstood idempotent operations?

— Reply to this email directly or view it on GitHubhttps://github.com/appdotnet/api-spec/issues/86#issuecomment-7894204.

I fail to see the point in logging room temperature without a timestamp or monotonically/sequentially increasing UUID present somewhere in the data. If you are just interested in a "current temperature" post, then status quo is sufficient, no?

JoshBlake commented 12 years ago

@abitgone Concrete examples are always good. In this case, your temp logger on the second reading (the one taken "a minute later") would be making a new post, not reattempting the first post, even if the post content is the same. As @tonymillion and @dch pointed out, each unique post (regardless of duplication of content) would be assigned a unique identifier by the client. The identify could be several things, a UUID/GUID, or a hash of the post and the timestamp of authorship and the client id, etc.

In your example, each temperature measurement would become a unique post and would have it's own UUID. It could cache up a whole hour's worth of identify temperature readings, each with different UUID, then blast them out all at once (burst wireless communications is good for battery). The next hour (ignoring the network architecture or server-side caching mechanism) the device or its server host could resend the whole batch, or part of the batch, regardless of the previous success of individual posts. No duplicates would be posted and if the ADN API receives and processes the requests, it will return success as many times as the client sends it for the same UUID. Once success is confirmed the device or server can stop resending the temp readings.

Another analogy, more human: Suppose you are in the kitchen and your significant other calls out to you from another room "Can you get me a drink?" You reply "Yes," but she does not hear you. A moment later she yells "Hey bozo can you get me a drink??" You reply louder "Yes," confirming you are getting one, and she hears. In this exchange of communication, you did not think that her second request meant she wants two drinks (as her request is idempotent! Sort of.) It's simply overcoming an asynchronous communication barrier.

abitgone commented 12 years ago

@JoshBlake It was a crummy example, I'll grant you that. The minute I hit 'comment on this issue', I'd thought of adding timestamps or GUIDs, which would mean the update itself wouldn't be a duplicate.

This puts the onus upon the developer who, let's be honest, should be thinking of much better ways to use a service to log room temperatures rather than just sending the current temperature and a room ID. The developer should probably also try and get some more sleep.

tonymillion commented 12 years ago

This kind of would be a good idea (and I have previously implemented it) for mobile devices on spotty cellular connection.

Real world use case: I set up a "post queue" on the device, and when signal is detected start the queue and post each queued post sequentially, I don't necessarily know if a post has hit the server until I get a response, but given the spotty nature of the connectivity, its possible the api has accepted the post, but I've not seen the response, hence when I try again the post gets duplicated until I see the success/failure response.

If the api would simply respond with the previously created post, my app is happy and we can move on to posting the next post.

Which brings me on to a possible "new issue" I need to start - client_created_at timestamps as well as server created_at, will open that one if its not already there.

abitgone commented 12 years ago

@tonymillion Perfect for things like posts which couldn't be sent at their original time – like, for exampe, where you're on the underground with no cellular service etc. I'll stop now to prevent hijacking this issue.