Closed ebastien closed 11 years ago
It reminds me of something we had before. Can you please paste the error message that you get?
Sorry, I don't have the Hadoop cluster available right now to reproduce. In the mean time, what I can say is that the error message states that the input file does not exist and that is all. If I create a file on the local filesystem with the same relative path, the code loads it instead of looking for it on the HDFS filesystem. In 0.7.0-RC2, whenever I use a file path without an explicit scheme, it looks for it on the default HDFS filesystem. In 0.7.0-RC3, it looks on the local filesystem of the client running the jar.
We are about to release 0.7.0, even if this fix is not in, but I propose you this workaround in the meantime:
val list = fromTextFile("path", check = Source.noInputCheck)
At least you should be able to run your code with that.
Thanks, I'll try that. BTW, I've noticed this commit: 79fe1ddf0cee33e8864f75ecb7895eed8113c839 , that seems to change the default filesystem used with the ClusterConfiguration. Do you think it might explain the behavior I see?
This commit just replaces some constants with their values. Actually you can help me debug this issue by tearing apart the default code doing the checking (pathExists
) and finding which condition exactly fails (replacing progressively check
from Source.noInputCheck
to all the conditions in pathExists
:
/** Determine whether a path exists or not. */
def pathExists(p: Path, pathFilter: PathFilter = hiddenFilePathFilter)(implicit conf: Configuration): Boolean = tryOrElse {
val fs = FileSystem.get(p.toUri, conf)
(fs.isFile(p) && fs.exists(p)) || getFileStatus(p, pathFilter).nonEmpty
}(false)
/** Get a Set of FileStatus objects for a given Path. */
def getFileStatus(path: Path, pathFilter: PathFilter = hiddenFilePathFilter)(implicit conf: Configuration): Seq[FileStatus] =
tryOrElse {
Option(FileSystem.get(path.toUri, conf).globStatus(new Path(path, "*"), pathFilter)).map(_.toSeq).getOrElse(Seq())
}(Seq())
private val hiddenFilePathFilter = new PathFilter {
def accept(p: Path): Boolean = !p.getName.startsWith("_") && !p.getName.startsWith(".")
}
Maybe FileSystem.get(path.toUri, conf).globStatus(new Path(path, "*"), pathFilter)
doesn't return anything?
I'm sorry to dump the debugging on you but we don't have a CDH3 cluster which I could use for this.
I see the error on my hdp1.2 cluster as well with 0.7.0 final. Here is the stack trace:
Exception in thread "main" java.lang.IllegalArgumentException: Can't instantiate public org.apache.hadoop.io.SequenceFile$Reader(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path,org.apache.hadoop.conf.Configuration) th
rows java.io.IOException : null
at com.nicta.scoobi.impl.util.Compatibility$.newInstance(Compatibility.scala:116)
at com.nicta.scoobi.impl.util.Compatibility$.newSequenceFileReader(Compatibility.scala:84)
at com.nicta.scoobi.io.sequence.CheckedSeqSource$$anonfun$checkInputPathType$1.apply(SequenceInput.scala:171)
at com.nicta.scoobi.io.sequence.CheckedSeqSource$$anonfun$checkInputPathType$1.apply(SequenceInput.scala:170)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
at com.nicta.scoobi.io.sequence.CheckedSeqSource.checkInputPathType(SequenceInput.scala:170)
at com.nicta.scoobi.io.sequence.SeqSource$$anonfun$inputCheck$1.apply(SequenceInput.scala:149)
at com.nicta.scoobi.io.sequence.SeqSource$$anonfun$inputCheck$1.apply(SequenceInput.scala:149)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at com.nicta.scoobi.io.sequence.SeqSource.inputCheck(SequenceInput.scala:149)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1.apply(ExecutionMode.scala:52)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1.apply(ExecutionMode.scala:49)
at org.kiama.attribution.AttributionCore$CachedParamAttribute$$anon$1.apply(AttributionCore.scala:111)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1$$anonfun$apply$3.apply(ExecutionMode.scala:55)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1$$anonfun$apply$3.apply(ExecutionMode.scala:55)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1.apply(ExecutionMode.scala:55)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1.apply(ExecutionMode.scala:49)
at org.kiama.attribution.AttributionCore$CachedParamAttribute$$anon$1.apply(AttributionCore.scala:111)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1$$anonfun$apply$3.apply(ExecutionMode.scala:55)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1$$anonfun$apply$3.apply(ExecutionMode.scala:55)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1.apply(ExecutionMode.scala:55)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1.apply(ExecutionMode.scala:49)
at org.kiama.attribution.AttributionCore$CachedParamAttribute$$anon$1.apply(AttributionCore.scala:111)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1$$anonfun$apply$3.apply(ExecutionMode.scala:55)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1$$anonfun$apply$3.apply(ExecutionMode.scala:55)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1.apply(ExecutionMode.scala:55)
at com.nicta.scoobi.impl.exec.ExecutionMode$$anonfun$checkSourceAndSinks$1$$anonfun$apply$1.apply(ExecutionMode.scala:49)
at org.kiama.attribution.AttributionCore$CachedParamAttribute$$anon$1.apply(AttributionCore.scala:111)
at com.nicta.scoobi.impl.exec.ExecutionMode$class.prepare(ExecutionMode.scala:41)
at com.nicta.scoobi.impl.exec.HadoopMode.com$nicta$scoobi$impl$exec$HadoopMode$$super$prepare(HadoopMode.scala:57)
at com.nicta.scoobi.impl.exec.HadoopMode$$anonfun$prepare$1.apply(HadoopMode.scala:57)
at com.nicta.scoobi.impl.exec.HadoopMode$$anonfun$prepare$1.apply(HadoopMode.scala:57)
at com.nicta.scoobi.impl.monitor.Loggable$LoggableObject.evaluated$lzycompute(Loggable.scala:38)
at com.nicta.scoobi.impl.monitor.Loggable$LoggableObject.evaluated(Loggable.scala:38)
at com.nicta.scoobi.impl.monitor.Loggable$LoggableObject.debug(Loggable.scala:49)
at com.nicta.scoobi.impl.monitor.Loggable$LoggableObject.debug(Loggable.scala:48)
at com.nicta.scoobi.impl.exec.HadoopMode.prepare(HadoopMode.scala:57)
at com.nicta.scoobi.impl.exec.HadoopMode.execute(HadoopMode.scala:51)
at com.nicta.scoobi.impl.exec.HadoopMode.execute(HadoopMode.scala:47)
at com.nicta.scoobi.impl.Persister.persist(Persister.scala:44)
at com.nicta.scoobi.impl.ScoobiConfigurationImpl.persist(ScoobiConfigurationImpl.scala:320)
at com.nicta.scoobi.application.Persist$class.persist(Persist.scala:33)
at com.ebay.scoobi.examples.Sojourner$.persist(Sojourner.scala:15)
at com.ebay.scoobi.examples.Sojourner$.run(Sojourner.scala:36)
at com.nicta.scoobi.application.ScoobiApp$$anonfun$main$1.apply$mcV$sp(ScoobiApp.scala:80)
at com.nicta.scoobi.application.ScoobiApp$$anonfun$main$1.apply(ScoobiApp.scala:75)
at com.nicta.scoobi.application.ScoobiApp$$anonfun$main$1.apply(ScoobiApp.scala:75)
at com.nicta.scoobi.application.Hadoop$class.runOnCluster(Hadoop.scala:108)
at com.ebay.scoobi.examples.Sojourner$.runOnCluster(Sojourner.scala:15)
at com.nicta.scoobi.application.Hadoop$class.executeOnCluster(Hadoop.scala:65)
at com.ebay.scoobi.examples.Sojourner$.executeOnCluster(Sojourner.scala:15)
at com.nicta.scoobi.application.Hadoop$$anonfun$onCluster$1.apply(Hadoop.scala:51)
at com.nicta.scoobi.application.InMemoryHadoop$class.withTimer(InMemory.scala:72)
at com.ebay.scoobi.examples.Sojourner$.withTimer(Sojourner.scala:15)
at com.nicta.scoobi.application.InMemoryHadoop$class.showTime(InMemory.scala:80)
at com.ebay.scoobi.examples.Sojourner$.showTime(Sojourner.scala:15)
at com.nicta.scoobi.application.Hadoop$class.onCluster(Hadoop.scala:51)
at com.ebay.scoobi.examples.Sojourner$.onCluster(Sojourner.scala:15)
at com.nicta.scoobi.application.Hadoop$class.onHadoop(Hadoop.scala:57)
at com.ebay.scoobi.examples.Sojourner$.onHadoop(Sojourner.scala:15)
at com.nicta.scoobi.application.ScoobiApp$class.main(ScoobiApp.scala:75)
at com.ebay.scoobi.examples.Sojourner$.main(Sojourner.scala:15)
at com.ebay.scoobi.examples.Sojourner.main(Sojourner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.nicta.scoobi.impl.util.Compatibility$.newInstance(Compatibility.scala:115)
... 80 more
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs://xxxx:8020/sys/xx/2013/06/20/00/zzz/part-00000, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:393)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:796)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
It might be related, but in my case I only got an error message saying that the input file does not exist. No stack trace on my side...
This new issue is related to the Compatibility
class I recently introduce to simplify the build w.r.t CDH3/CDH4. My local tests seems to be working but obviously I missed something. I'll fix that on Monday morning will publish a 7.0.1.
I take this back. This is indeed the same problem under a different manifestation. Somehow FileSystem.getLength(path)
fails because FileSystem
is a local one.
No worries, I reverted back to RC2 at work, so enjoy the weekend. Alex On Jun 21, 2013, at 5:34 PM, Eric Torreborre notifications@github.com wrote:
I take this back. This is indeed the same problem under a different manifestation. Somehow FileSystem.getLength(path) fails because FileSystem is a local one.
— Reply to this email directly or view it on GitHub.
I think I found the problem, and my apologies to Emmanuel, you were right by mentioning this commit 79fe1dd. I messed up the constant name change between CDH3 and CDH4. Can you please test 0.8.0-cdh3-SNAPSHOT when you have some time? If that works ok I'll publish a 0.7.1.
I ran 0.8.0-cdh3-SNAPSHOT and can confirm that fixed my problem on our cluster. Thanks!
Thanks Alex for testing this. I deployed a 0.7.1-cdh4/cdh3 version with the fix.
After upgrading to 0.7.0-RC3-cdh3-SNAPSHOT, all my relative file paths are failing. I have to specify the complete URLs: hdfs://namenode/myfiles . I am not sure that this is the expected behavior.