Closed milandesai closed 7 years ago
Pull request created. On of the issues here is that multiple paths can have the same key bytes if we do an in-place rename. But the RowKey is associated with a single path, so two RowKey objects with the same key bytes may be different objects. This is why in the attached pull request I had to be careful to compare the row key bytes for equality and instantiate a new RowKey object for the destination path. Perhaps with the introduction of INode ID row keys, we should consider separating the path from the RowKey and simply making RowKey a wrapper for the byte array?
Okay spent some time thinking on this one; I think we should handle all of it as part of this change too.
Here is my idea: we should have the RowKeyFactory deal with converting byte -> RowKey.
FullPathRowKeyFactory -> takes in a byte[] which is a full path -> converts to RowKey. INodeRowKeyFactory -> takes in a byte[] which is obtained from Curator's Distributed Long -> converts to RowKey.
Then perhaps we should re-use the Factory(s) or come up with another class that is capable of doing the reverse operations.
FullPathUtil -> converts RowKey to full path. INodeUtil -> takes in a RowKey -> makes possibly several HBase calls to traverse tree -> converts to full path.
If we have those operations then we can shift RowKey to just being a byte[] wrapper and move the path to the suited classes.
Milan I took a look further and have some questions.
This is what I tested with:
// Stage 0: in-place rename if src and dst keys are equal
if (rootSrcNode != null) {
Path rootSrcPath = new Path(rootSrcNode.getRowKey().getPath());
RowKey rootDstKey = keyFactory.newInstance(dst, rootSrcNode.getId());
Path rootDstPath = new Path(rootDstKey.getPath());
if (rootSrcPath.getParent().equals(rootDstPath.getParent())) {
LOG.debug("In-place rename of " + src + " to " + dst);
rootDstNode = rootSrcNode.cloneWithNewRowKey(rootDstKey);
nodeManager.updateINode(rootDstNode);
return;
}
}
I saw 6 failing tests with similar failures: testFileRename, testDirRenameOverwrite, testFileRenameOverwrite, testDirRenameNoChildren, testDirRenameRecoveryStage1PartlyComplete, and testDirRename.
I think the problem is that I called it in "in-place" rename, it's a little misleading. What I mean is that when doing a rename, if the destination row key doesn't change, we don't need to remove the INode and can just update the path. This comes up in the in-place rename scenario for ID-based row keys, where the destination key doesn't change so we just update the path.
I suppose it's an in-place rename in the database sense. Will fix tests.
There is no need to update any tests (I think); I ran your branch in Jenkins and locally and all tests passed; my screenshot is with my own code changes in place.
I understand better now; thanks.
I think I am comfortable with a +1 pending a nit that you update the comment to say something like Stage 0: If src and dst rowKeys are the same; update the path.
; something along those lines; just make it clear that this is for ID-based row keys.
Also I don't think we have to worry too much about '2 RowKeys with the same bytes pointing to 2 different objects' because this will only be the case for ID-based row keys and you handle it accordingly.
Thanks @zero45, added a clarifying comment regarding the in-place rename.
+1 this looks good. Don't forget to rebase. But let's comment in the issue, not pull request. Remember?
Pushed to trunk as 04fc4472b569e8aae255ac91909f7e2fd7cba053
.
INode IDs shouldn't be changing on rename. This also means we should include special logic for in-place renames, which we can define as one in which the destination row key is the same as the source row key. This will fix the rename issue for #34.