ceph / cephfs-hadoop

cephfs-hadoop
GNU Lesser General Public License v2.1
58 stars 47 forks source link

update the Ceph Hadoop plugin to Apache Hadoop/HDFS 2.7x #25

Open wwang-pivotal opened 8 years ago

wwang-pivotal commented 8 years ago

Hi guys The Apache Hadoop, HDFS have update to 2.7.x. They change lots in configuration then broken the Ceph Hadoop plugin. Could you update the Ceph Hadoop plugin rebase to the Apache Hadoop 2.7.x etc.

Thanks.

dotnwat commented 8 years ago

HI @wwang-pivotal I'll take a look at this this week. If the changes aren't major then it shouldn't take more than an a day or two. Patches welcome too :)

wormwang commented 8 years ago

Have u look the issue?

m0zes commented 8 years ago

This is certainly one of the changes needed, and this is only to get it partially working with Hadoop 2.6.0. I still can't get it to run yarn jobs.

diff --git a/src/main/java/org/apache/hadoop/fs/ceph/CephFileSystem.java b/src/main/java/org/apache/hadoop/fs/ceph/CephFileSystem.java
index a27384f..6f0df53 100644
--- a/src/main/java/org/apache/hadoop/fs/ceph/CephFileSystem.java
+++ b/src/main/java/org/apache/hadoop/fs/ceph/CephFileSystem.java
@@ -78,6 +78,10 @@ public class CephFileSystem extends FileSystem {
   public CephFileSystem() {
   }

+  protected int getDefaultPort() {
+    return 6789;
+  }
+
   /**
    * Create an absolute path using the working directory.
    */
dotnwat commented 8 years ago

Thank @m0zes. I've dropped the ball on 2.7, but I have some updates pending for that. I've only heard of a few problems with 2.6, and in those cases there were some things that were not reproducible. It would be helpful to know what other problems you were seeing with 2.6.

m0zes commented 8 years ago

Just trying one of the examples here, although even "debug" logging doesn't seem give me any idea on what is actually wrong. I believe this is at the filesystem level, though.

# hadoop  jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 10 100
16/04/26 12:40:40 DEBUG util.Shell: setsid exited with exit code 0
Number of Maps  = 10
Samples per Map = 100
16/04/26 12:40:40 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate
of successful kerberos logins and latency (milliseconds)], about=, always=false, type=DEFAULT, sampleName=Ops)
16/04/26 12:40:40 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate
of failed kerberos logins and latency (milliseconds)], about=, always=false, type=DEFAULT, sampleName=Ops)
16/04/26 12:40:40 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[GetGroup
s], about=, always=false, type=DEFAULT, sampleName=Ops)
16/04/26 12:40:40 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
16/04/26 12:40:40 DEBUG util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
16/04/26 12:40:40 DEBUG security.Groups:  Creating new Groups object
16/04/26 12:40:40 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
16/04/26 12:40:40 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
16/04/26 12:40:40 DEBUG security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
16/04/26 12:40:40 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping
16/04/26 12:40:40 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
16/04/26 12:40:40 DEBUG security.UserGroupInformation: hadoop login
16/04/26 12:40:40 DEBUG security.UserGroupInformation: hadoop login commit
16/04/26 12:40:40 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: mozes
16/04/26 12:40:40 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: mozes" with name mozes
16/04/26 12:40:40 DEBUG security.UserGroupInformation: User entry: "mozes"
16/04/26 12:40:40 DEBUG security.UserGroupInformation: UGI loginUser:mozes (auth:SIMPLE)
16/04/26 12:40:40 DEBUG core.Tracer: sampler.classes = ; loaded no samplers
16/04/26 12:40:40 TRACE core.TracerId: ProcessID(fmt=%{tname}/%{ip}): computed process ID of "FSClient/10.5.3.30"
16/04/26 12:40:40 TRACE core.TracerPool: TracerPool(Global): adding tracer Tracer(FSClient/10.5.3.30)
16/04/26 12:40:40 DEBUG core.Tracer: span.receiver.classes = ; loaded no span receivers
16/04/26 12:40:40 TRACE core.Tracer: Created Tracer(FSClient/10.5.3.30) for FSClient
Loading libcephfs-jni from default path: /usr/lib/hadoop/lib/native
Loading libcephfs-jni: /usr/lib64/libcephfs_jni.so
Loading libcephfs-jni: /usr/lib/jni/libcephfs_jni.so
Loading libcephfs-jni: Success!
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
16/04/26 12:40:42 DEBUG security.UserGroupInformation: PrivilegedAction as:mozes (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.connect(Job.java:1272)
16/04/26 12:40:42 DEBUG mapreduce.Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.LocalClientProtocolProvider
16/04/26 12:40:42 DEBUG mapreduce.Cluster: Cannot pick org.apache.hadoop.mapred.LocalClientProtocolProvider as the ClientProtocolProvider - returned null protocol
16/04/26 12:40:42 DEBUG mapreduce.Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.YarnClientProtocolProvider
16/04/26 12:40:42 DEBUG service.AbstractService: Service: org.apache.hadoop.mapred.ResourceMgrDelegate entered state INITED
16/04/26 12:40:42 DEBUG service.AbstractService: Service: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl entered state INITED
16/04/26 12:40:42 DEBUG azure.NativeAzureFileSystem: finalize() called.
16/04/26 12:40:42 DEBUG azure.NativeAzureFileSystem: finalize() called.
16/04/26 12:40:42 INFO client.RMProxy: Connecting to ResourceManager at gremlin00.beocat.ksu.edu/10.5.3.30:8032
16/04/26 12:40:42 DEBUG security.UserGroupInformation: PrivilegedAction as:mozes (auth:SIMPLE) from:org.apache.hadoop.yarn.client.RMProxy.getProxy(RMProxy.java:136)
16/04/26 12:40:42 DEBUG ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
16/04/26 12:40:42 DEBUG ipc.HadoopYarnProtoRPC: Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.yarn.api.ApplicationClientProtocol
16/04/26 12:40:42 DEBUG ipc.Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@3c86c285
16/04/26 12:40:42 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@74107a99
16/04/26 12:40:42 DEBUG service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.YarnClientImpl is started
16/04/26 12:40:42 DEBUG service.AbstractService: Service org.apache.hadoop.mapred.ResourceMgrDelegate is started
16/04/26 12:40:42 DEBUG security.UserGroupInformation: PrivilegedAction as:mozes (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:334)
16/04/26 12:40:42 DEBUG mapreduce.Cluster: Picked org.apache.hadoop.mapred.YarnClientProtocolProvider as the ClientProtocolProvider
16/04/26 12:40:42 DEBUG security.UserGroupInformation: PrivilegedAction as:mozes (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Cluster.getFileSystem(Cluster.java:161)
16/04/26 12:40:42 DEBUG security.UserGroupInformation: PrivilegedAction as:mozes (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
16/04/26 12:40:42 DEBUG mapred.ResourceMgrDelegate: getStagingAreaDir: dir=/staging/mozes/.staging
16/04/26 12:40:42 TRACE ipc.ProtobufRpcEngine: 1: Call -> gremlin00.beocat.ksu.edu/10.5.3.30:8032: getNewApplication {}
16/04/26 12:40:42 DEBUG ipc.Client: The ping interval is 60000 ms.
16/04/26 12:40:42 DEBUG ipc.Client: Connecting to gremlin00.beocat.ksu.edu/10.5.3.30:8032
16/04/26 12:40:42 DEBUG ipc.Client: IPC Client (1597504843) connection to gremlin00.beocat.ksu.edu/10.5.3.30:8032 from mozes: starting, having connections 1
16/04/26 12:40:42 DEBUG ipc.Client: IPC Client (1597504843) connection to gremlin00.beocat.ksu.edu/10.5.3.30:8032 from mozes sending #0
16/04/26 12:40:42 DEBUG ipc.Client: IPC Client (1597504843) connection to gremlin00.beocat.ksu.edu/10.5.3.30:8032 from mozes got value #0
16/04/26 12:40:42 DEBUG ipc.ProtobufRpcEngine: Call: getNewApplication took 161ms
16/04/26 12:40:42 TRACE ipc.ProtobufRpcEngine: 1: Response <- gremlin00.beocat.ksu.edu/10.5.3.30:8032: getNewApplication {application_id { id: 12 cluster_timestamp: 1461615899163 } maximumCapability { memory: 8192 virtual_cores: 4 }}
16/04/26 12:40:42 DEBUG mapreduce.JobSubmitter: Configuring job job_1461615899163_0012 with /staging/mozes/.staging/job_1461615899163_0012 as the submit dir
16/04/26 12:40:42 DEBUG mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:[ceph://hobbit01:6789/]
16/04/26 12:40:42 DEBUG mapreduce.JobResourceUploader: default FileSystem: ceph://hobbit01:6789
16/04/26 12:40:42 DEBUG mapreduce.JobSubmitter: Creating splits at ceph://hobbit01:6789/staging/mozes/.staging/job_1461615899163_0012
16/04/26 12:40:42 DEBUG input.FileInputFormat: Time taken to get FileStatuses: 32
16/04/26 12:40:42 INFO input.FileInputFormat: Total input paths to process : 10
16/04/26 12:40:42 DEBUG input.FileInputFormat: Total # of splits generated by getSplits: 10, TimeTaken: 35
16/04/26 12:40:43 INFO mapreduce.JobSubmitter: Cleaning up the staging area /staging/mozes/.staging/job_1461615899163_0012
java.lang.NullPointerException
        at org.apache.hadoop.io.Text.encode(Text.java:450)
        at org.apache.hadoop.io.Text.encode(Text.java:431)
        at org.apache.hadoop.io.Text.writeString(Text.java:480)
        at org.apache.hadoop.mapreduce.split.JobSplit$SplitMetaInfo.write(JobSplit.java:125)
        at org.apache.hadoop.mapreduce.split.JobSplitWriter.writeJobSplitMetaInfo(JobSplitWriter.java:193)
        at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:81)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:311)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
        at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
        at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
dotnwat commented 8 years ago

Wow, nothing there looks suspicious at first glance. The usual suspect is a mismatch between our bindings and what Hadoop expects, which seems to diverge occasionally. What version of Ceph are you running?

m0zes commented 8 years ago

I built cephfs-hadoop with the 9.2.1 libcephfs jar, 9.2.1 libcephfs_jni, and hadoop 2.6.0-cdh5.7.0. On ubuntu trusty.

The cluster I'm connecting to is also 9.2.1.

m0zes commented 8 years ago

For the life of me I can't see anything wrong with my configuration, but perhaps there is something else wrong. I know I can list, add, delete, and move files with the hdfs dfs suite of tools. Here is my configuration for reference. https://gist.github.com/m0zes/e6eb5ca39153989f7a37947a469e0b98

dbseraf commented 7 years ago

Has there been any progress on this lately? Anyone know whether ceph 10.2 works any better?

wormwang commented 7 years ago

Has there been any progress on this lately in 2017? Anyone know whether ceph 10.2 or 11.2 works any better?

dotnwat commented 7 years ago

There hasn't been much work on this. I don't have a lot of time to work on this in the short term, but would be happy to offer basic support. Have you tried deploying the bindings?

zphj1987 commented 7 years ago

@m0zes the same error with you paste,had you resolve it?

data:2 wanted=3
17/02/28 14:26:17 DEBUG mapreduce.JobSubmitter: Creating splits at ceph://10.168.10.1:6789/tmp/hadoop-yarn/staging/root/.staging/job_1488254605886_0020
17/02/28 14:26:17 DEBUG input.FileInputFormat: Time taken to get FileStatuses: 5
17/02/28 14:26:17 INFO input.FileInputFormat: Total input paths to process : 1
17/02/28 14:26:17 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/root/.staging/job_1488254605886_0020
java.lang.NullPointerException
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getBlockIndex(FileInputFormat.java:444)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:405)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at org.apache.hadoop.examples.Grep.run(Grep.java:78)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.examples.Grep.main(Grep.java:103)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
m0zes commented 7 years ago

No. I ended up creating individual pools for rbd for each hadoop node, no replication. Then I created 6 rbds for each hadoop node for parallelism. And I put hdfs on top of those rbds, with a forced 3x replication. Not an ideal setup, but I couldn't waste any more time going down the cephfs-hadoop route.

zphj1987 commented 7 years ago

I find it is the config error ,please change your core-site.xml to this:

fs.defaultFS ceph://10.168.10.1:6789 fs.ceph.impl org.apache.hadoop.fs.ceph.CephFileSystem fs.AbstractFileSystem.ceph.impl org.apache.hadoop.fs.ceph.CephFs

only this not other word ,and try again , yarn can run well

hope it can help you

2017-02-28 22:19 GMT+08:00 Adam Tygart notifications@github.com:

No. I ended up creating individual pools for rbd for each hadoop node, no replication. Then I created 6 rbds for each hadoop node for parallelism. And I put hdfs on top of those rbds, with a forced 3x replication. Not an ideal setup, but I couldn't waste any more time going down the cephfs-hadoop route.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ceph/cephfs-hadoop/issues/25#issuecomment-283050755, or mute the thread https://github.com/notifications/unsubscribe-auth/ACNujaJIsYcg6rYHu7CDxO8VUG_WC_k2ks5rhCzUgaJpZM4HmZ1i .

zphj1987 commented 7 years ago

@m0zes and i down my hadoop version to 2.7.1