Open fonsoim opened 11 years ago
On Mon, May 13, 2013 at 9:34 AM, fonsoim notifications@github.com wrote:
I cannot read the output of a mapreduce job.
The code:
data=to.dfs(1:10) res = mapreduce(input = data, map = function(k, v) cbind(v, 2*v)) print(res())
This is not documented, you are not supposed to do it, it could break in the next bugfix release, any code using it should be considered incorrect and you are doing a disservice to the project by posting it. Just so you know.
[1] "/tmp/Rtmpr5Xv1g/file34916a6426bf"
And then....
from.dfs(res)
Can you post the output of traceback() called immediately after this call? What versions of rmr2 and hadoop are you using?
Antonio
Exception in thread "main" java.io.FileNotFoundException: File does not exist: /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs ... ...
Finally,
hdfs.ls("/tmp/Rtmpr5Xv1g/file34916a6426bf")
permission owner group size modtime 1 -rw------- daniel supergroup 0 2013-05-13 18:24 2 drwxrwxrwt daniel supergroup 0 2013-05-13 18:23 3 -rw------- daniel supergroup 448 2013-05-13 18:24 4 -rw------- daniel supergroup 122 2013-05-13 18:23 file 1 /tmp/Rtmpr5Xv1g/file34916a6426bf/_SUCCESS 2 /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs 3 /tmp/Rtmpr5Xv1g/file34916a6426bf/part-00000 4 /tmp/Rtmpr5Xv1g/file34916a6426bf/part-00001
I note that /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs is a directory
Why does the program search the file "_logs" when it is a directory??????
Thanks in advance
Alfonso
— Reply to this email directly or view it on GitHubhttps://github.com/RevolutionAnalytics/rhdfs/issues/4 .
Sorry for submitting the same problem in different places.
I do not understand why I am not supposed to do this code. It is a simple example like in https://github.com/RevolutionAnalytics/rmr2/blob/master/docs/tutorial.md
The versions of rmr2 and hadoop are 2.1.0 and 2.0.0, respectively.
The code:
data=to.dfs(1:10) res = mapreduce(input = data, map = function(k, v) cbind(v, 2*v)) from.dfs(res)
The error:
from.dfs(res) DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it.
/usr/lib/hadoop-hdfs/bin/hdfs: line 24: /usr/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory /usr/lib/hadoop-hdfs/bin/hdfs: line 130: cygpath: command not found /usr/lib/hadoop-hdfs/bin/hdfs: line 162: exec: : not found Exception in thread "main" java.io.FileNotFoundException: File does not exist: /tmp/RtmpzXyC7B/file34c6342d57ed/_logs at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1312) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1258) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1231) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1213) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:392) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:170) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44064) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:972)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:960)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:171)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:138)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:131)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:746)
at org.apache.hadoop.streaming.AutoInputFormat.getRecordReader(AutoInputFormat.java:56)
at org.apache.hadoop.streaming.DumpTypedBytes.dumpTypedBytes(DumpTypedBytes.java:102)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:83)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /tmp/RtmpzXyC7B/file34c6342d57ed/_logs at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1312) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1258) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1231) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1213) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:392) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:170) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44064) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
at org.apache.hadoop.ipc.Client.call(Client.java:1225)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.getBlockLocations(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy9.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:154)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:970)
... 19 more
$key list()
$val list()
On Tue, May 21, 2013 at 1:21 AM, fonsoim notifications@github.com wrote:
Sorry for submitting the same problem in different places.
I do not understand why I am not supposed to do this code. It is a simple example like in https://github.com/RevolutionAnalytics/rmr2/blob/master/docs/tutorial.md
Where did you get the res() call and exposing the internal representation of a big data object? Not from me.
The versions of rmr2 and hadoop are 2.1.0 and 2.0.0, respectively.
How about the OS? Are you running windows? If so unfortunately it's not supported yet. If you are on linux, let's do this experiment. In R call
to.dfs(1:10, output = "/tmp/ls-test")
At the shell prompt try
hadoop dfs -ls /tmp/ls-test
The first two errors that you get point to hadoop problems independent of R and this little experiment will help confirm that.
Antonio
The code:
data=to.dfs(1:10) res = mapreduce(input = data, map = function(k, v) cbind(v, 2*v)) from.dfs(res)
The error:
from.dfs(res) DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it.
/usr/lib/hadoop-hdfs/bin/hdfs: line 24: /usr/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory /usr/lib/hadoop-hdfs/bin/hdfs: line 130: cygpath: command not found
This is where I suspect you are running the windows version.
/usr/lib/hadoop-hdfs/bin/hdfs: line 162: exec: : not found Exception in thread "main" java.io.FileNotFoundException: File does not exist: /tmp/RtmpzXyC7B/file34c6342d57ed/_logs at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1312) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1258) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1231) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1213) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:392) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:170) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44064) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:972)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:960)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:171)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:138)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:131)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:746)
at org.apache.hadoop.streaming.AutoInputFormat.getRecordReader(AutoInputFormat.java:56)
at org.apache.hadoop.streaming.DumpTypedBytes.dumpTypedBytes(DumpTypedBytes.java:102)
at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:83)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /tmp/RtmpzXyC7B/file34c6342d57ed/_logs at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1312) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1258) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1231) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1213) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:392) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:170) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44064) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
at org.apache.hadoop.ipc.Client.call(Client.java:1225)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy9.getBlockLocations(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy9.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:154)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:970)
... 19 more
$key list()
$val list()
— Reply to this email directly or view it on GitHubhttps://github.com/RevolutionAnalytics/rhdfs/issues/4#issuecomment-18195085 .
The OS is Ubuntu 12.04
I did your experinent:
to.dfs(1:10, output = "/tmp/ls-test") hadoop dfs -ls /tmp/ls-test
It works. The HDFS contains the file located in "/tmp/ls-test". Then, I list the file at the shell prompt.
Maybe we have two problems here. One is that you have a configuration error. It doesn't seem to be very common for googling around, nonetheless I suspect you won't be up an running until you fix it. Take a look at this reporthttp://hortonworks.com/community/forums/topic/unable-to-start-the-datanode/and see if you can get some insight as to what is wrong with your configuration. The other is from.dfs trying to read the logs directory. This is puzzling. There is an explicit filter that discards anything starting with . Could you try this in R:
rmr2:::part.list("/tmp/ls-test")
I am not sure what the connection between the two problems could be, but related or not we need to solve both to make progress. Thanks
Antonio
On Wed, May 22, 2013 at 1:27 AM, fonsoim notifications@github.com wrote:
The OS is Ubuntu 12.04
I did your experinent:
to.dfs(1:10, output = "/tmp/ls-test") hadoop dfs -ls /tmp/ls-test
It works. The HDFS contains the file located in "/tmp/ls-test". Then, I list the file at the shell prompt.
— Reply to this email directly or view it on GitHubhttps://github.com/RevolutionAnalytics/rhdfs/issues/4#issuecomment-18264010 .
hi, is this resolved? I have the same problem thanks
I cannot read the output of a mapreduce job.
The code:
data=to.dfs(1:10) res = mapreduce(input = data, map = function(k, v) cbind(v, 2*v)) print(res())
[1] "/tmp/Rtmpr5Xv1g/file34916a6426bf"
And then....
from.dfs(res)
Exception in thread "main" java.io.FileNotFoundException: File does not exist: /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs ... ...
Finally,
hdfs.ls("/tmp/Rtmpr5Xv1g/file34916a6426bf")
permission owner group size modtime 1 -rw------- daniel supergroup 0 2013-05-13 18:24 2 drwxrwxrwt daniel supergroup 0 2013-05-13 18:23 3 -rw------- daniel supergroup 448 2013-05-13 18:24 4 -rw------- daniel supergroup 122 2013-05-13 18:23 file 1 /tmp/Rtmpr5Xv1g/file34916a6426bf/_SUCCESS 2 /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs 3 /tmp/Rtmpr5Xv1g/file34916a6426bf/part-00000 4 /tmp/Rtmpr5Xv1g/file34916a6426bf/part-00001
I note that /tmp/Rtmpr5Xv1g/file34916a6426bf/_logs is a directory
Why does the program search the file "_logs" when it is a directory??????
Thanks in advance
Alfonso