RevolutionAnalytics / rmr2

A package that allows R developer to use Hadoop MapReduce
160 stars 149 forks source link

The system cannot find the path specified. #169

Open jornfranke opened 9 years ago

jornfranke commented 9 years ago

Hallo,

I am currently using RMR on Windows 2012 64-Bit (HDP 2.2 / Hadoop 2.6.0).

I had used it already before on Linux, with only minor issues.

The problem is the following

Warning message:

running command 'c:\hdp\hadoop-2.6.0.2.2.4.2-0002\bin\hadoop jar c:\hdp\hadoop-2.6.0.2.2.4.2-0002\share\hadoop\tools\lib\hadoop-streaming-2.6.0.2.2.4.2-0002.jar loadtb /user/hadoop/tmprmr2/file229011aa57ef < /Users/hadoop/AppData/Local/Temp/2/RtmpI9kKEA/file2290620f56bb' had status 1

Apperantly, RMR converts the local path of the user incorrectly and omits the drive letter: /Users/hadoop/AppData/Local/Temp/2/RtmpI9kKEA/file2290620f56bb

Is this a bug under Windows?

Thank you.

Best regards

piccolbo commented 9 years ago

I don't have a win system to try it out right now, if you have an R console open can you enter tempfile() and paste the results here? If you are good at debugging, you can enter debug(rmr2:::rmr.normalize.path); to.dfs(1) and see what's going on. Could you also provide the version you are using? Please keep in mind that it must be a little more complicated than you laid out because we test under windows for every release, and to.dfs is called hundreds of times in the test suite, almost always with a default output path. But I am not saying it's not a bug. Aren't the slashes also slanted the wrong way?

piccolbo commented 9 years ago

Possibly a case of overreach in 9bc70dcc412a6d52cf8ffb5354f95dc3783a3fa7 But it should have impacted testing.

jornfranke commented 9 years ago

R version 3.2.0 (2015-04-16) -- "Full of Ingredients"

Copyright (C) 2015 The R Foundation for Statistical Computing

Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.

Type 'contributors()' for more information and

'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.

Type 'q()' to quit R.

library(rmr2)

Please review your hadoop settings. See help(hadoop.settings)

Sys.setenv("HADOOP_STREAMING"="c:\hdp\hadoop-2.6.0.2.2.4.2-0002\share\ha$

Sys.setenv("HADOOP_PREFIX"=" c:\hdp\hadoop-2.6.0.2.2.4.2-0002")

Sys.setenv("HADOOP_CMD"="c:\hdp\hadoop-2.6.0.2.2.4.2-0002\bin\hadoop")

rmr.options(backend="hadoop",hdfs.tempdir = file.path("/user/hadoop/tmprmr2"$

NULL

debug(rmr2:::rmr.normalize.path); to.dfs(1)

debugging in: rmr.normalize.path(tempfile(pattern, tmpdir))

debug: {

if (.Platform$OS.type == "windows")

    url.or.path = gsub("\\\\", "/", url.or.path)

gsub("/$", "", gsub("/+", "/", paste("/", parse_url(url.or.path)$path,

    sep = "")))

}

Browse[2]>

debug: if (.Platform$OS.type == "windows") url.or.path = gsub("\",

"/", url.or.path)

Browse[2]>

debug: url.or.path = gsub("\", "/", url.or.path)

Browse[2]>

debug: gsub("/$", "", gsub("/+", "/", paste("/", parse_url(url.or.path)$path,

sep = "")))

Browse[2]>

exiting from: rmr.normalize.path(tempfile(pattern, tmpdir))

debugging in: rmr.normalize.path(outf)

debug: {

if (.Platform$OS.type == "windows")

    url.or.path = gsub("\\\\", "/", url.or.path)

gsub("/$", "", gsub("/+", "/", paste("/", parse_url(url.or.path)$path,

    sep = "")))

}

Browse[2]>

debug: if (.Platform$OS.type == "windows") url.or.path = gsub("\",

"/", url.or.path)

Browse[2]>

debug: url.or.path = gsub("\", "/", url.or.path)

Browse[2]>

debug: gsub("/$", "", gsub("/+", "/", paste("/", parse_url(url.or.path)$path,

sep = "")))

Browse[2]>

exiting from: rmr.normalize.path(outf)

debugging in: rmr.normalize.path(inf)

debug: {

if (.Platform$OS.type == "windows")

    url.or.path = gsub("\\\\", "/", url.or.path)

gsub("/$", "", gsub("/+", "/", paste("/", parse_url(url.or.path)$path,

    sep = "")))

}

Browse[2]>

debug: if (.Platform$OS.type == "windows") url.or.path = gsub("\",

"/", url.or.path)

Browse[2]>

debug: url.or.path = gsub("\", "/", url.or.path)

Browse[2]>

debug: gsub("/$", "", gsub("/+", "/", paste("/", parse_url(url.or.path)$path,

sep = "")))

Browse[2]>

exiting from: rmr.normalize.path(inf)

The system cannot find the path specified.

function ()

{

fname

}

<bytecode: 0x000000000c4e0890>

<environment: 0x000000000c4e0f68>

Warning message:

running command 'c:\hdp\hadoop-2.6.0.2.2.4.2-0002\bin\hadoop jar c:\hdp\hadoop-2

.6.0.2.2.4.2-0002\share\hadoop\tools\lib\hadoop-streaming-2.6.0.2.2.4.2-0002.jar

loadtb /user/hadoop/tmprmr2/file295c58cb5999 < /Users/HADOOP~1.002/AppData/Loca

l/Temp/Rtmp4a14rq/file295c595678b5' had status 1

jornfranke commented 9 years ago

It is a toy environment, so I am happy to test any patch

jornfranke commented 9 years ago

Hallo,

I use the newest release of RMR2 (3.3.1). The output of tempfile is c:\Users\HADOOP~1.002\AppData\Local\Temp\Rtmp4a14rq/file295c595678b5 I also checked with much older releases and the problem persists there. I think the problem is that Windows uses drive letters. You can see this in the function "parse_url" (it sets the drive letter as scheme) and "rmr.normalize.path", which uses only parse_url$path (/Users/HADOOP~1.002/AppData/Local/Temp/Rtmp4a14rq/file295c595678b5).

Once that is fixed you run into another problem: running command 'c:\hdp\hadoop-2.6.0.2.2.4.2-0002\bin\hadoop jar c:\hdp\hadoop-2

.6.0.2.2.4.2-0002\share\hadoop\tools\lib\hadoop-streaming-2.6.0.2.2.4.2-0002.jar

-D "stream.map.input=typedbytes" -D "stream.map.output=typedbytes

" -D "stream.reduce.input=typedbytes" -D "stream.reduce.output=typedbytes" -D "mapred.reduce.tasks=1" -files "C:/Users/hadoop/AppData/Local/Temp/Rtmpg9UC3s/rmr-local-env2d4754a7f04,C:/Users/ hadoop/AppData/Local/Temp/Rtmpg9UC3s/rmr-global-env2d45f742af4,C:/Users/ hadoop /AppData/Local/Temp/Rtmpg9UC3s/rmr-streaming-map2d4436624a7" -input "/user/hadoop/tmprmr2/file2d4faa6666" -output "/user/hadoop/tmprmr2/file2d4316b3905" -mapper "Rscript --vanilla ./rmr-streaming-map2d4436624a7"

-inputformat "org.apache.hadoop.streaming.AutoInputFormat" -outputformat "org.apache.hadoop.mapred.SequenceFileOutputFormat" 2>&1' had status 1

The bold text must be the following: -files "file://C:/Users/hadoop/AppData/Local/Temp/Rtmpg9UC3s/rmr-local-env2d4754a7f04,file://C:/Users/hadoop/AppData/Local/Temp/Rtmpg9UC3s/rmr-global-env2d45f742af4,file://C:/Users/ hadoop/AppData/Local/Temp/Rtmpg9UC3s/rmr-streaming-map2d4436624a7"

Unless there is some important configuration you need to do for Windows, I have some doubt it worked ever there. I do not know how they do it on Azure, but either they use Linux or do not have drive letters.

piccolbo commented 9 years ago

Thanks Jörn for the additional analysis. You need to understand that, as compelling as your analysis looks

So we are having someone repeat the tests, to see if anything has changed. If they pass, then we need to look at the differences between setups. Nothing obvious from what you reported so far.

On Mon, Jun 1, 2015 at 11:06 AM, Jörn Franke notifications@github.com wrote:

Hallo,

I use the newest release of RMR2 (3.3.1). The output of tempfile is c:\Users\HADOOP~1.002\AppData\Local\Temp\Rtmp4a14rq/file295c595678b5 I also checked with much older releases and the problem persists there. I think the problem is that Windows uses drive letters. You can see this in the function "parse_url" (it sets the drive letter as scheme) and "rmr.normalize.path", which uses only parse_url$path (/Users/HADOOP~1.002/AppData/Local/Temp/Rtmp4a14rq/file295c595678b5).

Once that is fixed you run into another problem: running command 'c:\hdp\hadoop-2.6.0.2.2.4.2-0002\bin\hadoop jar c:\hdp\hadoop-2

.6.0.2.2.4.2-0002\share\hadoop\tools\lib\hadoop-streaming-2.6.0.2.2.4.2-0002.jar

-D "stream.map.input=typedbytes" -D "stream.map.output=typedbytes

" -D "stream.reduce.input=typedbytes" -D "stream.reduce.output=typedbytes" -D "mapred.reduce.tasks=1" -files "C:/Users/hadoop/AppData/Local/Temp/Rtmpg9UC3s/rmr-local-env2d4754a7f04,C:/Users/ hadoop/AppData/Local/Temp/Rtmpg9UC3s/rmr-global-env2d45f742af4,C:/Users/ hadoop /AppData/Local/Temp/Rtmpg9UC3s/rmr-streaming-map2d4436624a7" -input "/user/hadoop/tmprmr2/file2d4faa6666" -output "/user/hadoop/tmprmr2/file2d4316b3905" -mapper "Rscript --vanilla ./rmr-streaming-map2d4436624a7"

-inputformat "org.apache.hadoop.streaming.AutoInputFormat" -outputformat "org.apache.hadoop.mapred.SequenceFileOutputFormat" 2>&1' had status 1

The bold text must be the following: -files "file:// C:/Users/hadoop/AppData/Local/Temp/Rtmpg9UC3s/rmr-local-env2d4754a7f04, file:// C:/Users/hadoop/AppData/Local/Temp/Rtmpg9UC3s/rmr-global-env2d45f742af4, _file://_C:/Users/ hadoop/AppData/Local/Temp/Rtmpg9UC3s/rmr-streaming-map2d4436624a7"

Unless there is some important configuration you need to do for Windows, I have some doubt it worked ever there. I do not know how they do it on Azure, but either they use Linux or do not have drive letters.

— Reply to this email directly or view it on GitHub https://github.com/RevolutionAnalytics/rmr2/issues/169#issuecomment-107657253 .

jornfranke commented 9 years ago

Hi, sorry, I think you misunderstood me. This was not a blame, but merely that it cannot work in my given setting. As I pointed above in the very first post, it is a special configuration.

It builds correctly this has been verified by you.

I did not ask to undo some changes, because I wrote already that even older versions do not work in my configuration. They work perfectly fine in other non-Windows configurations.

I think we need to figure out what is wrong here and fix it.

I am happy to assist you with further investigation.

Can you tell me what kind of Windows system you are using for testing?

YanglabWCH commented 6 years ago

Hello jornfranke,

I meet the same problem, have you resolved it?

Thanks a lot!

Bests, Shisheng

jornfranke commented 6 years ago

Hi,

I did some local fixes, but in general this library does not work so good under windows. Even Microsoft runs it only under Linux ( in Azure cloud).

Best regards

On 17. Dec 2017, at 13:40, qade544 notifications@github.com wrote:

Hello jornfranke,

I meet the same problem, have you resolved it?

Thanks a lot!

Bests, Shisheng

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.