OriginProtocol / origin-devops

We've moved to a monorepo: https://github.com/OriginProtocol/origin/tree/master/devops
MIT License
2 stars 2 forks source link

Conditions upon which there could be data loss in IPFS #15

Open ambertch opened 6 years ago

ambertch commented 6 years ago

(this issue does not contain an immediate action, but serves to document an architectural discussion)

Context:

Currently, creating a listing is a transaction in the sense that separate operations take place:

  1. Uploading the listing data to IPFS
  2. Submitting the listing creation transaction

It's most important to not lose data during these operations: it would be preferable to roll back a transaction and have the client retry, than to create a listing and lose the listing data stored in IPFS, for example.

Conditions upon which there could be data loss in IPFS:

  1. (current design) Uploads are not pinned by default. Origin content hashes are retrieved out of band and pinned: here, if a listing is uploaded to IPFS and GC on that machine runs before any pinner in the system runs, the machine (having not yet pinned the listing) could GC the listing and there would be data loss.

One way to prevent this from occurring would be to disable automatic GC. A combination of profiling and monitoring would inform us when GC should occur, at which time all nodes would be put in maintenance mode to prevent new uploads, and after 1.5-3 minutes (6-12 confirmations, guaranteeing that all Origin content hashes would be in the blockchain events) GC invoked (via ipfs repo gc), then all nodes taken out of maintenance mode. This is a reasonable near-term solution since there will be a manageable volume of transactions, meaning that windows for GC maintenance could be anticipated and scheduled for a time during which transaction volume is low.

  1. (an alternative design) Pinning by default. Origin content hashes are retrieved out of band and non-Origin content unpinned: here, if a pinner runs after a listing has been uploaded to IPFS but before the transaction has been confirmed (included in a block and propagated across the network, thus generating the log event), it could unpin Origin content.

One way to prevent this from occurring would be to keep a mapping of content hashes to upload time, in order to implement a grace period during which content cannot be unpinned. Upload times of specific content hashes are not stored in the IPFS DAG, so these would have to be manually recorded.

cuongdo commented 6 years ago

cc @franckc

Regarding condition 2, it's also possible avoid adding more decentralized state by having the pinner store (via SQLite or some simple local k/v data store) a little local state. Here's how this could work:

  1. During each run, the pinner would store which IPFS hashes have no corresponding Origin listing and a corresponding timestamp
  2. If the pinner encounters a hash for which enough time has elapsed since the timestamp in step 1 such that an Origin transaction should have been confirmed, it will proceed to unpin the content
ambertch commented 6 years ago

Condition / design 2 (with storing elapsed time since upload) also has some cool properties during failure states:

franckc commented 6 years ago

@ambertch @cuongdo I really like this idea of having the pinner storing hashes + timestamp in a local store. Especially because the logic would be simple and that approach should not cause too much operational overhead.

+1 for going in that direction