JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.86k stars 710 forks source link

NerDLApproach: unable to use graphFolder for a custom built tensorflow graph on Yarn cluster #979

Closed dkincaid closed 4 years ago

dkincaid commented 4 years ago

Description

When needing to train an NER model I'm unable to specify a tensorflow graph to be used with the graphFolder parameter. I have the graph file "blstm_45_100_128_99.pb" in a directory on HDFS (/user/hadoop) and specify this directory in the setGraphFolder() on the NerDLApproach object.

I create a pipeline that includes this NerDLApproach object and then try to call .fit() using the dataset created from the CoNLL file I have.

The following error is thrown:

Error: java.lang.IllegalArgumentException: Wrong FS: hdfs://ip-XXX-XX-XXX-XX.emr.datafusion.idexx.com:8020/user/hadoop/blstm_45_100_128_99.pb, expected: file:/// at 
org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669) at 
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86) at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635) at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:866) at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:630) at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:452) at 
org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) at 
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2049) at 
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2017) at 
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1982) at 
com.johnsnowlabs.nlp.util.io.ResourceHelper$SourceStream.copyToLocal(ResourceHelper.scala:61) at 
com.johnsnowlabs.nlp.util.io.ResourceHelper$.copyToLocal(ResourceHelper.scala:83) at 
com.johnsnowlabs.nlp.annotators.ner.dl.WithGraphResolver$$anonfun$5.apply(NerDLApproach.scala:371) at 
com.johnsnowlabs.nlp.annotators.ner.dl.WithGraphResolver$$anonfun$5.apply(NerDLApproach.scala:371) at 
scala.Option.map(Option.scala:146) at 
com.johnsnowlabs.nlp.annotators.ner.dl.WithGraphResolver$class.searchForSuitableGraph(NerDLApproach.scala:371) at 
com.johnsnowlabs.nlp.annotators.ner.dl.NerDLApproach$.searchForSuitableGraph(NerDLApproach.scala:433) at 
com.johnsnowlabs.nlp.annotators.ner.dl.NerDLApproach.train(NerDLApproach.scala:313) at 
com.johnsnowlabs.nlp.annotators.ner.dl.NerDLApproach.train(NerDLApproach.scala:30) at 
com.johnsnowlabs.nlp.AnnotatorApproach._fit(AnnotatorApproach.scala:55) at 
com.johnsnowlabs.nlp.AnnotatorApproach.fit(AnnotatorApproach.scala:61) at 
org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:153) at 
org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149) at 
scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44) at 
scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37) at 
org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149) at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:96) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at 
java.lang.reflect.Method.invoke(Method.java:498) at sparklyr.Invoke.invoke(invoke.scala:147) at 
sparklyr.StreamHandler.handleMethodCall(stream.scala:136) at sparklyr.StreamHandler.read(stream.scala:61) at 
sparklyr.BackendHandler$$anonfun$channelRead0$1.apply$mcV$sp(handler.scala:58) at 
scala.util.control.Breaks.breakable(Breaks.scala:38) at sparklyr.BackendHandler.channelRead0(handler.scala:38) at 
sparklyr.BackendHandler.channelRead0(handler.scala:14) at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at 
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) at
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284) at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138) at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at 
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at 
java.lang.Thread.run(Thread.java:748)

Context

Unable to train our own NER model.

Your Environment

maziyarpanahi commented 4 years ago

This has been resolved in 2.5.5 release