JohnSnowLabs / johnsnowlabs

Gateway into the John Snow Labs Ecosystem
https://nlp.johnsnowlabs.com
Apache License 2.0
54 stars 25 forks source link

Bad folder reference when installating in Docker? #6

Open josejuanmartinez opened 1 year ago

josejuanmartinez commented 1 year ago

A prospect asked for a Docker installation.

I prepared everything, but jsl.install() fails trying to resolve the folder where to download / install everything. Is getting a weird .johnsnowlabs. Maybe due to docker volumes?

image image
C-K-Loan commented 1 year ago

Thank you @josejuanmartinez can you share some information on the Docker Image that was used? Especially the base would be interesting. I am taking a look and trying to reproduce this on an Ubuntu image today

C-K-Loan commented 1 year ago

Thanks for the report, there was a bug with handling some paths like the one in your docker image.

It's fixed with pip install johnsnowlabs==4.2.3rc1

Also see updated Dockerfile for install reference Dockerfile.txt

josejuanmartinez commented 1 year ago

Hey, this is what I'm getting with jsl.start() (BUT EVERYTHING WORKS AFTER THAT!)

How to reproduce:

1) docker-compose up -d .
2) docker exec -it johnsnowlabs /bin/bash
3) source jslenv/bin/activate
4) python (to open the python console)
5) >> from johnsnowlabs import *
6) >> jsl.start()

docker_example.zip


>>> from johnsnowlabs import *
>>> jsl.start()
👌 Detected license file /home/jsl/license.json
22/10/19 11:25:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/10/19 11:26:05 WARN Executor: Issue communicating with driver in heartbeater
org.apache.spark.SparkException: Exception thrown in awaitResult:
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103)
        at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:87)
        at org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:78)
        at org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:589)
        at org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:1000)
        at org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:212)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
        at org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
        at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:524)
        at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:116)
        at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
        at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more
22/10/19 11:26:05 ERROR Inbox: Ignoring error
java.lang.NullPointerException
        at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:524)
        at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:116)
        at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
        at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
👌 Launched cpu-Optimized JVM SparkSession with Jars for: 🚀Spark-NLP==4.2.1, 💊Spark-Healthcare==4.2.0, 🕶Spark-OCR==4.
1.0, running on âš¡ PySpark==3.1.2
<pyspark.sql.session.SparkSession object at 0x7f42aea8ca90>```
C-K-Loan commented 1 year ago

@josejuanmartinez thanks for the report and glad to hear it works. This error message pop-ups randomly when starting up a spark session and on some systems.
But it's not a critical one. Let's keep this ticket open to track this message and maybe we can improve the UX here in the future