GiraffaFS / giraffa

Giraffa FileSystem (Slack: giraffa-fs.slack.com)
https://giraffa.ci.cloudbees.com
Apache License 2.0
17 stars 6 forks source link

Auto migration of file leases. #132

Closed shvachko closed 9 years ago

shvachko commented 9 years ago

Lease information in Giraffa is persisted in the file row. That way when the region moves to another region server as the result of RS failure or a split of the region, the new RS will recreate the lease from the lease info stored in the row. We should implement this logic.

shvachko commented 9 years ago

It will require LeaseManager to be shared between NamespaceProcessor and BlockManagementAgent. In #31 we tried to place it in coprocessors sharedData, but sharedData is only shared between the same instances of the same coprocessor, and therefore it didn't work. If there is no better way (let's try to find it first though), we can make LeaseManager a static singleton, which is one way to share the reference between classes.

pjeli commented 9 years ago

We should investigate into this after the HBase upgrade is complete. If we can turn NamespaceProcessor into an endpoint coprocessor and get access to the "openRegion" methods then we can listen in on the lease columns and add to the LeaseManager as needed.

shvachko commented 9 years ago

The logic seems to be solid. Few comments:

  1. FileLease.NO_LAST_UPDATE is not used. Pls remove.
  2. Why do we commit generated protobuf files? Should we file an issue for that?
  3. Let's not use @VisibleForTesting. Just public - that's it. Useless decorations...
  4. Not very good idea to use whole RegionServerServices as the key in leaseManagerMap. Would it be better to use getRpcServer().getListenerAddress().toString()?
  5. Reverse if-statement in LeaseManager.getLeaseManager().
    1. Can we call it originateSharedLeaseManager()? Seems like this is what it used to be until now.
  6. isNamespaceTable() should be static, and not private then.
  7. BlockManagementAgent.getLease() should be getFileLease()
  8. JavaDoc comment to byteArrayToLease() @return sounds odd. I understand it was pasta-copied from the previous method. But let's not multiply oddness in this project.
  9. Also why do we not throw IOExcpetion if serialization failed, but just log it, should we? And then we can just use GiraffaPBHelper.bytesToHdfsLease() and get rid of BlockManagementAgent.byteArrayToLease().
  10. An informative JavaDoc for testLeaseMigration() would be quite helpful.
pjeli commented 9 years ago
  1. Done.
  2. We commit generated protobuf files so that others do not have to generate the protos themselves everytime they check out the project. This is how its done in Hadoop as well. It would be nice to build protobuf generation into our build but that seems like a separate issue.
  3. Done.
  4. Done.
  5. Done and done.
  6. Done.
  7. Done.
  8. Done.
  9. I think we should do some refactoring here and put all serialization / deserialization into a standard class. I think GiraffaPBHelper can be evolved into GiraffaSerializationHelper where we can put all byte[], Result, Proto, and other serialization pieces into and go from there. Sounds like separate issue though.
  10. Done.

Changes can be seen here: https://github.com/GiraffaFS/giraffa/commit/0d904dbe10300fad33e243cd1b2d5a8084da1488

shvachko commented 9 years ago

This is how its done in Hadoop as well. It would be nice to build protobuf generation into our build but that seems like a separate issue.

I thought in Hadoop protobuf is generated in target\generated-sources. Let's file a new issue to do that.

I changed leaseManagerMap to map String to LeaseManager. This is what I meant in (4). Also added a JavaDoc explaining it is only needed in unit tests. Got rid of byteArrayToLease() as described in (9). Yes, let's file another issue to refactor serialization into a separate class.

The rest looks good. +1. Will commit if no objections.

shvachko commented 9 years ago

Just committed. Thank you Plamen.

shvachko commented 9 years ago

Need to reopen this. The upgrade branch is colliding with this commit. I fixed compilation, but then TestLeaseManagement fails, and it looks like it requires a lot of changes to fix it. So I propose to revert this for now. Commit the upgrade and then rework this issue with new HBase APIs. Particularly, BlockManagementAgent should mostly work with Cell instead of KeyValue. I simply did a contextual replace of KeyValue to Cell. Restored branch issue-132 as current trunk (before reverting).

pjeli commented 9 years ago

Found out the cause of the issues. Stopping the only remaining RegionServer while maintaining that ZK assigns the META storage was triggering some ShutdownProcessHook that was causing the migration test to fail.

I've updated my branch "issue-132" with some configuration parameter changes to the unit test that allow the MiniHBaseCluster to not get stuck when we stop a RegionServer.

Now the problem comes down to the GiraffaClient getting ConnectionRefused exceptions when it tries to close the file or renew the lease AFTER the RegionServer dies.

shvachko commented 9 years ago

Plamen, you probably want to reveal a bit more info about what didn't work. Or you will forget it by next Wednesday :-).

shvachko commented 9 years ago

I merged this with current trunk (post #153 commit). Pushed the branch. Will commit as tests are passing.

shvachko commented 9 years ago

Committed!! Thank you Plamen.