GiraffaFS / giraffa

Giraffa FileSystem (Slack: giraffa-fs.slack.com)
https://giraffa.ci.cloudbees.com
Apache License 2.0
17 stars 6 forks source link

Don't change INode ID on rename #221

Closed milandesai closed 7 years ago

milandesai commented 8 years ago

INode IDs shouldn't be changing on rename. This also means we should include special logic for in-place renames, which we can define as one in which the destination row key is the same as the source row key. This will fix the rename issue for #34.

milandesai commented 8 years ago

Pull request created. On of the issues here is that multiple paths can have the same key bytes if we do an in-place rename. But the RowKey is associated with a single path, so two RowKey objects with the same key bytes may be different objects. This is why in the attached pull request I had to be careful to compare the row key bytes for equality and instantiate a new RowKey object for the destination path. Perhaps with the introduction of INode ID row keys, we should consider separating the path from the RowKey and simply making RowKey a wrapper for the byte array?

pjeli commented 8 years ago

Okay spent some time thinking on this one; I think we should handle all of it as part of this change too.

Here is my idea: we should have the RowKeyFactory deal with converting byte -> RowKey.

FullPathRowKeyFactory -> takes in a byte[] which is a full path -> converts to RowKey. INodeRowKeyFactory -> takes in a byte[] which is obtained from Curator's Distributed Long -> converts to RowKey.

Then perhaps we should re-use the Factory(s) or come up with another class that is capable of doing the reverse operations.

FullPathUtil -> converts RowKey to full path. INodeUtil -> takes in a RowKey -> makes possibly several HBase calls to traverse tree -> converts to full path.

If we have those operations then we can shift RowKey to just being a byte[] wrapper and move the path to the suited classes.

pjeli commented 8 years ago

Milan I took a look further and have some questions.

  1. Why is it considered an in-place rename if src and dst RowKeys are equal? That does not mean an in-place rename to me; that means overwriting. In-place rename is if both src and dst are contained in the same directory.
  2. I don't understand why you return after the in-place rename. Don't you want to delete the src INode afterwards?

This is what I tested with:

    // Stage 0: in-place rename if src and dst keys are equal
    if (rootSrcNode != null) {
      Path rootSrcPath = new Path(rootSrcNode.getRowKey().getPath());
      RowKey rootDstKey = keyFactory.newInstance(dst, rootSrcNode.getId());
      Path rootDstPath = new Path(rootDstKey.getPath());
      if (rootSrcPath.getParent().equals(rootDstPath.getParent())) {
        LOG.debug("In-place rename of " + src + " to " + dst);
        rootDstNode = rootSrcNode.cloneWithNewRowKey(rootDstKey);
        nodeManager.updateINode(rootDstNode);
        return;
      }
    }
screen shot 2016-05-12 at 2 13 01 am

I saw 6 failing tests with similar failures: testFileRename, testDirRenameOverwrite, testFileRenameOverwrite, testDirRenameNoChildren, testDirRenameRecoveryStage1PartlyComplete, and testDirRename.

milandesai commented 8 years ago

I think the problem is that I called it in "in-place" rename, it's a little misleading. What I mean is that when doing a rename, if the destination row key doesn't change, we don't need to remove the INode and can just update the path. This comes up in the in-place rename scenario for ID-based row keys, where the destination key doesn't change so we just update the path.

milandesai commented 8 years ago

I suppose it's an in-place rename in the database sense. Will fix tests.

pjeli commented 8 years ago

There is no need to update any tests (I think); I ran your branch in Jenkins and locally and all tests passed; my screenshot is with my own code changes in place.

I understand better now; thanks.

I think I am comfortable with a +1 pending a nit that you update the comment to say something like Stage 0: If src and dst rowKeys are the same; update the path.; something along those lines; just make it clear that this is for ID-based row keys.

Also I don't think we have to worry too much about '2 RowKeys with the same bytes pointing to 2 different objects' because this will only be the case for ID-based row keys and you handle it accordingly.

milandesai commented 7 years ago

Thanks @zero45, added a clarifying comment regarding the in-place rename.

shvachko commented 7 years ago

+1 this looks good. Don't forget to rebase. But let's comment in the issue, not pull request. Remember?

milandesai commented 7 years ago

Pushed to trunk as 04fc4472b569e8aae255ac91909f7e2fd7cba053.