UWNetworksLab / uProxy-p2p

Internet without borders
https://www.uproxy.org/
Apache License 2.0
865 stars 182 forks source link

Make reporting calls idempotent #2679

Open fortuna opened 8 years ago

fortuna commented 8 years ago

In case of network errors, the client may issue a report request twice, but we don't want to record it twice.

In any case, the client will need to generate a unique identifier for the request and pass that as a parameter. We can use a UUID library such as uuid.

On the server side, there are a few possibilities:

  1. Use the unique id as key for the insertion. We would write a request twice, but that's ok, since it won't be duplicated.
  2. Keep a set of "processed ids" in memory and check against that. This is simple, but breaks on server restarts.
  3. Keep a set of "processed ids" in Cloud Datastore. However, the Datastore is eventually consistent by default, so we should check the parameters to increase consistency. This also requires a read before the write.

Any other ideas? #1 seems to have the best robustness/simplicity trade-off.

jab commented 8 years ago

Vector clocks? https://en.wikipedia.org/wiki/Vector_clock http://bravenewgeek.com/you-cannot-have-exactly-once-delivery/

fortuna commented 8 years ago

Josh, we don't actually care about the ordering of the events. They can be recorded in any order, as long as it only happens one. We also don't need a strongly consistent data store.

jab commented 8 years ago

Gotcha. +1 to #1, I see no downside to that. (Were you hoping to use something else for the primary key?)

Not sure how frequently reports are sent, and whether new data would accumulate in between a failed request and a retry, but just in case, would it be worth allowing the client to include multiple reports in the same request, to save unnecessary roundtrips? (Note I'm not proposing that the client have logic to merge multiple reports into a single report, since I take it we probably want to keep all report processing logic on the server.)

fortuna commented 8 years ago

I'd keep it a single request for now. However, maybe we should move to a design where we send a JSON object on the body of the request instead of using the url. That would allow us to use more complex structures and more easily switch to batch support, besides hiding url parameters from possible logging along the way.

trevj commented 8 years ago

+1 to #1.