NICTA / scoobi

A Scala productivity framework for Hadoop.
http://nicta.github.com/scoobi/
482 stars 97 forks source link

scoobi hadoop2 - wildcard pattern doesn't match existing files on hdfs #305

Closed deeptibhatia closed 10 years ago

deeptibhatia commented 10 years ago

We are using the 0.8-hadoop2-snapshot along with Alex Coozi's patch to fix the getJobClient error. (https://groups.google.com/forum/#!msg/scoobi-dev/0EHoYw4Cl64/rAI3pdAoaDUJ).

If I specify the full path of the input file or directory, the job works fine. If the input path contains a wild card to match the existing file on hdfs, scoobi errors out saying the input path does not exist. Stacktrace below.

WIldcard matching works with hadoop 1.0, but not with hadoop 2.0 (I checked for both 0.7 and 0.8 versions of scoobi)

Any ideas on how to address this?


java.io.IOException: Input path /Users/dbhatia/github/scoobi-playground/src/data/test/romeo-and-julie*.txt does not exist. at com.nicta.scoobi.core.Source$$anonfun$3$$anonfun$apply$2.apply(DataSource.scala:108) at com.nicta.scoobi.core.Source$$anonfun$3$$anonfun$apply$2.apply(DataSource.scala:106) at com.nicta.scoobi.core.Source$$anonfun$3.apply(DataSource.scala:106) at com.nicta.scoobi.core.Source$$anonfun$3.apply(DataSource.scala:105) at com.nicta.scoobi.io.text.TextSource.inputCheck(TextInput.scala:133) at com.nicta.scoobi.impl.exec.ExecutionMode$class.checkNode$1(ExecutionMode.scala:59) at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1.apply(ExecutionMode.scala:64) at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1.apply(ExecutionMode.scala:64) at com.nicta.scoobi.impl.exec.ExecutionMode$class.checkSourceAndSinks(ExecutionMode.scala:64) at com.nicta.scoobi.impl.exec.HadoopMode.checkSourceAndSinks(HadoopMode.scala:44) at com.nicta.scoobi.impl.exec.ExecutionMode$class.prepare(ExecutionMode.scala:43) at com.nicta.scoobi.impl.exec.HadoopMode.com$nicta$scoobi$impl$exec$HadoopMode$$super$prepare(HadoopMode.scala:58) at com.nicta.scoobi.impl.exec.HadoopMode$$anonfun$prepare$1.apply(HadoopMode.scala:58) at com.nicta.scoobi.impl.exec.HadoopMode$$anonfun$prepare$1.apply(HadoopMode.scala:58) at com.nicta.scoobi.impl.monitor.Loggable$LoggableObject.evaluated$lzycompute(Loggable.scala:38) at com.nicta.scoobi.impl.monitor.Loggable$LoggableObject.evaluated(Loggable.scala:38) at com.nicta.scoobi.impl.monitor.Loggable$LoggableObject.debug(Loggable.scala:49) at com.nicta.scoobi.impl.monitor.Loggable$LoggableObject.debug(Loggable.scala:48) at com.nicta.scoobi.impl.exec.HadoopMode.prepare(HadoopMode.scala:58) at com.nicta.scoobi.impl.exec.HadoopMode.execute(HadoopMode.scala:52) at com.nicta.scoobi.impl.exec.HadoopMode.execute(HadoopMode.scala:48) at com.nicta.scoobi.impl.Persister.persist(Persister.scala:44) at com.nicta.scoobi.impl.ScoobiConfigurationImpl.persist(ScoobiConfigurationImpl.scala:355) at com.nicta.scoobi.application.Persist$class.persist(Persist.scala:33) at com.paypal.scoobiplayground.pipeline.TestPipeline$.persist(TestPipeline.scala:5) at com.paypal.scoobiplayground.pipeline.TestPipeline$.run(TestPipeline.scala:32) at com.nicta.scoobi.application.ScoobiApp$$anonfun$main$1.apply$mcV$sp(ScoobiApp.scala:80) at com.nicta.scoobi.application.ScoobiApp$$anonfun$main$1.apply(ScoobiApp.scala:75) at com.nicta.scoobi.application.ScoobiApp$$anonfun$main$1.apply(ScoobiApp.scala:75) at com.nicta.scoobi.application.Hadoop$class.runOnCluster(Hadoop.scala:112) at com.paypal.scoobiplayground.pipeline.TestPipeline$.runOnCluster(TestPipeline.scala:5) at com.nicta.scoobi.application.Hadoop$class.executeOnCluster(Hadoop.scala:69) at com.paypal.scoobiplayground.pipeline.TestPipeline$.executeOnCluster(TestPipeline.scala:5) at com.nicta.scoobi.application.Hadoop$$anonfun$onCluster$1.apply(Hadoop.scala:55) at com.nicta.scoobi.application.InMemoryHadoop$class.withTimer(InMemory.scala:72) at com.paypal.scoobiplayground.pipeline.TestPipeline$.withTimer(TestPipeline.scala:5) at com.nicta.scoobi.application.InMemoryHadoop$class.showTime(InMemory.scala:80) at com.paypal.scoobiplayground.pipeline.TestPipeline$.showTime(TestPipeline.scala:5) at com.nicta.scoobi.application.Hadoop$class.onCluster(Hadoop.scala:55) at com.paypal.scoobiplayground.pipeline.TestPipeline$.onCluster(TestPipeline.scala:5) at com.nicta.scoobi.application.Hadoop$class.onHadoop(Hadoop.scala:61) at com.paypal.scoobiplayground.pipeline.TestPipeline$.onHadoop(TestPipeline.scala:5) at com.nicta.scoobi.application.ScoobiApp$class.main(ScoobiApp.scala:75) at com.paypal.scoobiplayground.pipeline.TestPipeline$.main(TestPipeline.scala:5) at com.paypal.scoobiplayground.pipeline.TestPipelineSpec.testSample(TestPipelineSpec.scala:18) at com.paypal.scoobiplayground.pipeline.TestPipelineSpec$$anonfun$1.apply(TestPipelineSpec.scala:10) at com.paypal.scoobiplayground.pipeline.TestPipelineSpec$$anonfun$1.apply(TestPipelineSpec.scala:10)

Hadoop Dependencies Version

org.apache.hadoop:hadoop-common:2.2.0.2.0.6.0-76 org.apache.hadoop:hadoop-annotations:2.2.0.2.0.6.0-76 org.apache.hadoop:hadoop-mapreduce-client-app:2.2.0.2.0.6.0-76 org.apache.hadoop:hadoop-yarn-api:2.2.0.2.0.6.0-76 org.apache.hadoop:hadoop-mapreduce-client-core:2.2.0.2.0.6.0-76 org.apache.hadoop:hadoop-mapreduce-client-jobclient:2.2.0.2.0.6.0-76

deeptibhatia commented 10 years ago

Thanks for addressing this.

On Tue, Jan 21, 2014 at 10:15 PM, Eric Torreborre notifications@github.comwrote:

Closed #305 https://github.com/NICTA/scoobi/issues/305 via f607da8https://github.com/NICTA/scoobi/commit/f607da8fc390b2c43917a2ea36bde498dc8bbace .

— Reply to this email directly or view it on GitHubhttps://github.com/NICTA/scoobi/issues/305 .