GiraffaFS / giraffa

Giraffa FileSystem (Slack: giraffa-fs.slack.com)
https://giraffa.ci.cloudbees.com
Apache License 2.0
17 stars 6 forks source link

Replace updateINode() with pinpoint row updates. #129

Open shvachko opened 9 years ago

shvachko commented 9 years ago

We routinely use updateINode() when we need to update the corresponding row fields, which updates the entire row. In many cases the whole row update is not needed, and can actually lead to a race condition similar to the one observer in #31, as described by Plamen, where one thread first gets the INode, then writes and updates version of it back into the row, while another thread updates the row in between. The alternative is to update only the fields that are actually changing rather than the entire row. This issue is to investigate where pinpoint updates are applicable and find potential race conditions.

octo47 commented 9 years ago

To make things atomic and consistent I'd suggest to make file records to be protobuf objects and use version column which is checked by checkAndPut. Pros:

  1. size, protobuf objects will consume far less space then hbase fileds (every field prefixed with full key/qualifer/timestamp, so we consumer much more memory then regular namenode)
  2. atomicity - we can update atomically data using checkAndPut, so there would be no races
  3. consistency - we will not suffer from timestamp race when some puts could be hided because of back-in-time timestamp.
  4. Versioning - protobuf will care about version compatibility

Cons:

  1. Need to read object before update it, so we can't update fields blindly (but we can factor out such fields, like leases or blocks and make them regular fields)