Factual / drake

Data workflow tool, like a "Make for data"
Other
1.48k stars 110 forks source link

HDFS compatibility failing #162

Closed chip-factual closed 9 years ago

chip-factual commented 9 years ago

Drake is failing to connect to Factual's hadoop cluster, with an error similar to the one shown on the drake wiki. @ahadrana traced the issue back to an out of date hadoop-core version in project.clj.

Here's the stack trace I'm seeing:

java.net.ConnectException: Call to dev/10.199.0.213:8020 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1134)
    at org.apache.hadoop.ipc.Client.call(Client.java:1110)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
    at com.sun.proxy.$Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:129)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:255)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:217)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1563)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1597)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1579)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:212)
    at drake.fs$hdfs_filesystem.invoke(fs.clj:163)
    at drake.fs.HDFS.exists_QMARK_(fs.clj:172)
...
dirtyvagabond commented 9 years ago

@chip-factual if you and/or @ahadrana can let me know exactly which hadoop-core version we want, i'll get you an updated Drake build

ahadrana commented 9 years ago

Hi Aaron,

I don't believe it is as simple as that. Drake uses clj-hdfs, which needs to be updated as well. And there are some issue related to kerberos that we have to think about.

Ahad.

On Mon, Mar 30, 2015 at 7:13 PM, Aaron Crow notifications@github.com wrote:

@chip-factual https://github.com/chip-factual if you and/or @ahadrana https://github.com/ahadrana can let me know exactly which hadoop-core version we want, i'll get you an updated Drake build

— Reply to this email directly or view it on GitHub https://github.com/Factual/drake/issues/162#issuecomment-87900108.

amalloy commented 9 years ago

@ahadrana I can work on this, but I don't know a lot about the hadoop ecosystem: how its libraries work, or how the pieces fit together. So I'd need some guidance for like...what it is I actually need to do. I can update version numbers in project.clj files as well as the next guy, but if it's more complicated than that I'll need some help.

dirtyvagabond commented 9 years ago

Hi Alan, assigning to you per recent conversations with you and @ahadrana . Please let me know if I can be of help

ahadrana commented 9 years ago

Sounds good to me. Given the fact that this has not be an issue until recently and we have been using kerberos for more than a year, I am not sure it is high enough priority to preempt any neutronic work that might be on your plate. Feel free to ping me if you have any questions.

On Thu, Apr 2, 2015 at 2:04 PM, Aaron Crow notifications@github.com wrote:

Hi Alan, assigning to you per recent conversations with you and @ahadrana https://github.com/ahadrana . Please let me know if I can be of help

— Reply to this email directly or view it on GitHub https://github.com/Factual/drake/issues/162#issuecomment-89043971.

jinwen commented 9 years ago

@chip-factual , Do you build your own drake.jar to run your workflow, possibly with master branch? Could you please try to do it with branch iflow-demo? I did some fixes about this on the branch before. However, it does not use the latest hadoop version which we use for our cluster now.

chip-factual commented 9 years ago

Thanks @jinwen, I got it working using @dongshu-factual's fix from https://github.com/Factual/data-projects/issues/650.

mavericklou commented 9 years ago

Fixed in develop branch 92ffa2eb9b8344577b9b240c1750d50da84efead