GiraffaFS / giraffa

Giraffa FileSystem (Slack: giraffa-fs.slack.com)
https://giraffa.ci.cloudbees.com
Apache License 2.0
17 stars 6 forks source link

Upgrade to Hadoop 2.5.1 #94

Closed shvachko closed 9 years ago

shvachko commented 9 years ago

Original issue 94 created by shvachko on 2015-04-09T23:05:23.000Z:

Currently investigating what this work will entail.

shvachko commented 9 years ago

Comment #1 originally posted by shvachko on 2015-04-13T18:19:04.000Z:

The biggest changes between Hadoop 2.0.5 and Hadoop 2.6.0 are the addition of dozens of methods to the ClientProtocol interface. They deal with the following features:

Rolling Upgrade ACLs Extended Attributes Snaphots Caching File Encryption Heterogenous Storages

Currently, I am focused on making the code compile.

shvachko commented 9 years ago

Comment #5 originally posted by shvachko on 2015-04-15T18:56:49.000Z:

I created three issues that we should resolve first in order to make the upgrade smoother. They will separate HBase logic from INode processing, make NamespaceProcessor and NamespaceAgent more compact, and fix up warnings and style issues.

shvachko commented 9 years ago

It looks like the latest release is HBase 1.0.1. And it depends on Hadoop 2.5.1. Let's stick to that for now. We don't want HBase talking to Hadoop 2.6.0 using its HDFS client of 2.5.1.

shvachko commented 9 years ago

I renamed the title to the right versions. Please feel free to split this issue into separate Hadoop and HBase upgrades if needed. I think we should start with Hadoop anyways.

milandesai commented 9 years ago

I linked a pull request that has a commit for the hadoop 2.5.1 upgrade and another commit for the hbase 1.0.1 upgrade. For the HBase upgrade, I got rid of the RpcRetryingCaller/Factory extensions by instead creating an Interceptor called FileSystemExceptionInterceptor. Interceptors can be used to detect and process rpc exceptions. So instead of changing the retry logic, I can use the interceptor to see if the exception is one that should be thrown (i.e. a filesystem exception) or ignored and retried (i.e. NotServingRegionException). The BlockManagementAgent change in the hbase commit came from @octo47 .

milandesai commented 9 years ago

Though it compiles, the tests aren't working for me - getting connection exceptions for all of them, so need to investigate why.

octo47 commented 9 years ago

@milandesai this build doesn't work for me on MacOSX. need #114 and need to remove exclusions of "jasper-runtime". Would you mind add this to you patch? To be more specific it fails with

main:
    [mkdir] Created dir: /Users/octo/Projects/wandisco/giraffa/giraffa-core/target/hbase-webapps/giraffa
     [copy] Copying 9 files to /Users/octo/Projects/wandisco/giraffa/giraffa-core/target/hbase-webapps/giraffa
[INFO] Logging to org.slf4j.impl.SimpleLogger(org.mortbay.log) via org.mortbay.log.Slf4jLog
java.util.MissingResourceException: Can't find bundle for base name org.apache.jasper.resources.LocalStrings, locale en_US
    at java.util.ResourceBundle.throwMissingResourceException(ResourceBundle.java:1499)
    at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1322)
    at java.util.ResourceBundle.getBundle(ResourceBundle.java:721)
    at org.apache.jasper.compiler.Localizer.<clinit>(Localizer.java:36)
    at org.apache.jasper.JspC.initWebXml(JspC.java:1203)
    at org.apache.jasper.JspC.execute(JspC.java:1117)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
    at org.apache.tools.ant.TaskAdapter.execute(TaskAdapter.java:154)
    at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)

... maven stacktrace skipped ....
octo47 commented 9 years ago

Here is a couple of problems: We set defaultFs to grfa:// in core-site.xml. During cluster startup org.apache.hadoop.hbase.util.FSUtils#setVersion() tries to create files and that happens it doesn't use full qualified paths, so defaultFs is used, and in turn we try to initialize Giraffa before hbase was started. Even worse, it uses config from classpath without correct zookeeper clientPort set. That is why it complains that it can't connect. After removing of defaultFs from core-site.xml (btw, why it needed here, may be it is better to have grfa-site.xml and add it it defatulResources of Configuration?). So after removing we get another error which is triggered because we should use correct Configuration class, i.e. HBaseConfiguration.create()

public static HBaseTestingUtility getHBaseTestingUtility() {
    Configuration conf = HBaseConfiguration.create();
    conf.set(DFSConfigKeys.FS_DEFAULT_NAME_KEY,
        DFSConfigKeys.FS_DEFAULT_NAME_DEFAULT);
    return new HBaseTestingUtility(conf);
  }

That will fix an issue with cluster initialisation.

octo47 commented 9 years ago

And cluster should be shut down using UTIL.shutdownMiniCluster (it need to stop zk cluster too).

milandesai commented 9 years ago

Thanks @octo47, it seems that we should split this issue into two patches then, one for hbase and another for hadoop. So you can create another pull request with your hbase changes and when testing we can test them together. I'll modify the current pull request to only include the hbase changes.

milandesai commented 9 years ago

Ok I reverted the HBase commit and rename this issue to one just for upgrading Hadoop.

octo47 commented 9 years ago

It seems that we use not fully initialized structure org.apache.giraffa.GiraffaPBHelper#LOCATED_BLOCK_PROTO. Some applications will probably expect to see nonzero datanode list for LocatedBlock. And need to be poolId to setup too. So it can look like

 public class GiraffaPBHelper {
-  private final static LocatedBlockProto LOCATED_BLOCK_PROTO = PBHelper.convert(
-      new LocatedBlock(new ExtendedBlock(), new DatanodeInfo[0]));
+  private final static LocatedBlockProto LOCATED_BLOCK_PROTO;
+
+  static {
+    ExtendedBlock eb = new ExtendedBlock("E177563F-7DA2-4977-BCE6-40B67653F980", new Block(0));
+    LocatedBlock b = new LocatedBlock(eb, new DatanodeInfo[0]);
+    LOCATED_BLOCK_PROTO = PBHelper.convert(b);
+  }

org.apache.hadoop.hdfs.protocolPB.PBHelper expect to see some Block with poolId set in it. Otherwise it fails with

Caused by: java.lang.NullPointerException
    at org.apache.hadoop.hdfs.protocol.proto.HdfsProtos$ExtendedBlockProto$Builder.setPoolId(HdfsProtos.java:1095)
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:512)
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:663)
    at org.apache.giraffa.GiraffaPBHelper.<clinit>(GiraffaPBHelper.java:76)
    ... 19 more
octo47 commented 9 years ago

And proto files need to be regenerated. hdfs.proto and Security.proto should be taked from hadoop-2.5.1

milandesai commented 9 years ago

Pushed new changes:

shvachko commented 9 years ago

Checked Milan's latest changes. Looks good except changes to GiraffaPBHelper.java and Security.proto are not needed. I'll merge new trunk and commit this to your branch Issue94.

milandesai commented 9 years ago

@shvachko, are you sure we don't need Security.proto? If we're going to have this file, then it makes sense we should have the latest file for the Hadoop version. Also, the branch has already been merged with most recent trunk.

shvachko commented 9 years ago

The change in Security.proto is only adding a comment. Is it needed? Or did we copy it from Hadoop? I merged todays two commits to your branch.

milandesai commented 9 years ago

I copied Security.proto and hdfs.proto from Hadoop. I think we should keep the change to be consistent.

shvachko commented 9 years ago

OK, if it is a copy let's make it consistent. I'll add it back.

shvachko commented 9 years ago

I reverted branch Issue94 locally to the point of pre issue-132 merge. And merged the latter into trunk.

shvachko commented 9 years ago

Committed along with #122. Thank you Milan.