We routinely use updateINode() when we need to update the corresponding row fields, which updates the entire row. In many cases the whole row update is not needed, and can actually lead to a race condition similar to the one observer in #31, as described by Plamen, where one thread first gets the INode, then writes and updates version of it back into the row, while another thread updates the row in between.
The alternative is to update only the fields that are actually changing rather than the entire row.
This issue is to investigate where pinpoint updates are applicable and find potential race conditions.
To make things atomic and consistent I'd suggest to make file records to be protobuf objects and use version column which is checked by checkAndPut.
Pros:
size, protobuf objects will consume far less space then hbase fileds (every field prefixed with full key/qualifer/timestamp, so we consumer much more memory then regular namenode)
atomicity - we can update atomically data using checkAndPut, so there would be no races
consistency - we will not suffer from timestamp race when some puts could be hided because of back-in-time timestamp.
Versioning - protobuf will care about version compatibility
Cons:
Need to read object before update it, so we can't update fields blindly (but we can factor out such fields, like leases or blocks and make them regular fields)
We routinely use
updateINode()
when we need to update the corresponding row fields, which updates the entire row. In many cases the whole row update is not needed, and can actually lead to a race condition similar to the one observer in #31, as described by Plamen, where one thread first gets the INode, then writes and updates version of it back into the row, while another thread updates the row in between. The alternative is to update only the fields that are actually changing rather than the entire row. This issue is to investigate where pinpoint updates are applicable and find potential race conditions.