Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.util.start.registerListenerAndStartRefresh. : java.net.SocketTimeoutException: connect timed out

uzairahmadxy commented 2 years ago

Hi guys. I'm trying to run spark NLP for healthcare locally and I seem to have the compatible versions of spark/java but it still throws an error (screenshots attached). Anyone face this?


import json
import os

# Loading license key
with open('key.json') as f:
    license_keys = json.load(f)

# Defining license key-value pairs as local variables
locals().update(license_keys)
os.environ.update(license_keys)

# Installing pyspark and spark-nlp
! pip install --upgrade -q pyspark==3.1.2 spark-nlp==$PUBLIC_VERSION

# Installing Spark NLP Healthcare
! pip install --upgrade -q spark-nlp-jsl==$JSL_VERSION  --extra-index-url https://pypi.johnsnowlabs.com/$SECRET

!pyspark --version

!pip show spark-nlp-jsl

!pip show spark-nlp

import json
import os

from pyspark.ml import Pipeline, PipelineModel
from pyspark.sql import SparkSession

import sparknlp
import sparknlp_jsl

from sparknlp.annotator import *
from sparknlp_jsl.annotator import *
from sparknlp.base import *
from sparknlp.util import *
from sparknlp.pretrained import ResourceDownloader
from pyspark.sql import functions as F

import pandas as pd

pd.set_option('display.max_columns', None)  
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)

import string
import numpy as np

params = {"spark.driver.memory":"16G",
          "spark.kryoserializer.buffer.max":"2000M",
          "spark.driver.maxResultSize":"2000M"}

spark = sparknlp_jsl.start(secret = SECRET, params=params)

print ("Spark NLP Version :", sparknlp.version())
print ("Spark NLP_JSL Version :", sparknlp_jsl.version())

spark

uzairahmadxy commented 2 years ago

I forgot to mention I have a trial Healthcare license.

C-K-Loan commented 2 years ago

@uzairahmadxy can you share the full error trace from the notebook and also check your jupyter shell for any errors and share those?

uzairahmadxy commented 2 years ago

Hi @C-K-Loan. Here's the additional information

C-K-Loan commented 2 years ago

Thank you for sharing @uzairahmadxy Looks like something is not correctly setup with your hadoop utils. Make sure to precisely follow every step listed here https://nlp.johnsnowlabs.com/docs/en/install#windows-support This should fix all your issues

uzairahmadxy commented 2 years ago

Hi @C-K-Loan

I re-installed everything using the instructions. It still throws the error (note: I don't see the Hadoop utils error now in the jupyter kernel though).

C-K-Loan commented 2 years ago

Nice that's one less error! @uzairahmadxy can you test running this open source notebook and see if it works or not ?

https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Public/1.SparkNLP_Basics.ipynb You can skip the cells with pip install

Also could you copy paste the entire error trace you get here or https://pastebin.com/

uzairahmadxy commented 2 years ago

Hi @C-K-Loan This is for the healthcare notebook kernel (https://pastebin.com/cV6ymZvR)

Also, the training notebook doesn't run. Here are the traces for the open source notebook: Python Interpreter Error: https://pastebin.com/XiXLxnnT Jupyter Kernel: https://pastebin.com/v7jn0EBr

Side note: Pyspark works ok (as shown in the screenshot. I thought there was an issue with spark before)

C-K-Loan commented 2 years ago

Thank you for sharing @uzairahmadxy

Looks like the jar loaded into you spark session is missing some classes. But you should have downloaded the fat jar, i.e. the one with all the dependencies when running sparknlp.start()

@uzairahmadxy
Can you try manually downloading the Spark-NLP jar and then start a Spark-Session by passing the path to it?
I.e. Download : https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-4.2.2.jar

Then instead of sparknlp.start() run the following and try continue running the rest of the Notebook 1

spark =  SparkSession.builder \
    .appName("Spark NLP")\
    .master("local[*]")\
    .config("spark.driver.memory","16G")\
    .config("spark.driver.maxResultSize", "0") \
    .config("spark.kryoserializer.buffer.max", "2000M")\
    .config("spark.jars", "path/to/the/spark-nlp.jar")\
    .getOrCreate()

Maybe this is a Windows Specific bug, I think @josejuanmartinez is on Windows have you maybe seen this?

josejuanmartinez commented 2 years ago

Hey I am not on Windows anymore sorry

uzairahmadxy commented 2 years ago

Thanks @C-K-Loan. Manually loading the jar worked for the basic spark nlp.

I guess the same will have to be done for using the healthcare library as well. Can you please share where I can get these from?

C-K-Loan commented 2 years ago

Hi @uzairahmadxy, great good to know that this works and sorry for the bug

to get the healthcare jar : replace secret with your healthcare Secret and lib_version and you will have the URL. https://pypi.johnsnowlabs.com/{secret}/spark-nlp-jsl-{lib_version}.jar
i.e. if the secret is 4.2.1.agdfgdgdl the url would be https://pypi.johnsnowlabs.com/4.2.1.agdfgdgdl/spark-nlp-jsl-4.2.1.jar

@Meryem1425 can you see if you run into the same issue on Windows?

uzairahmadxy commented 2 years ago

Thank you for sharing @C-K-Loan

While the jars are loaded, the problem still persists as I want to load pretrained healthcare models/pipelines.

Error Trace: https://pastebin.com/xtkJKVLk Jupyter Kernel: https://pastebin.com/fznqEBvq

_Side note: In order to manually download the healthcare model from the models hub, I'm assuming I have to specify the secret. How do we do download that?_

Cabir40 commented 2 years ago

Can you test if your license is valid by running it on this notebook?

Can you share the last versions you used? (java? pyspark?, spark-nlp?, spark-nlp-jsl?)

if you want to download manually? you can use this script, and in this notebook there is same example

from sparknlp.pretrained import ResourceDownloader
ResourceDownloader.downloadModelDirectly("clinical/models/embeddings_clinical_en_2.4.0_2.4_1580237286004.zip", "clinical/models")

uzairahmadxy commented 2 years ago

The license works on notebook (tried on Collab).

Here are the versions used:

Java 8 (OpenJDK 64-Bit Server VM, 1.8.0_345)
Pyspark (Version 3.3.1)
Spark-NLP (4.2.0)
Spark-NLP-JSL (4.2.0)

Meryem1425 commented 1 year ago

I followed https://nlp.johnsnowlabs.com/docs/en/install#windows-support that website @uzairahmadxy. I set up correctly. I didn't any bug. Please make sure all stage apply correctly.

You have to create java folder, spark folder, hadoop folder and tmp folder under the C folder. And then you have to make sure about set environment variable. Look at stage number 4 and 5.

Could you delete all things and then follow installation step? Thank you

C-K-Loan commented 1 year ago

@uzairahmadxy I notice you are using openJDK, but Adopt OpenJDK is recommended,

JohnSnowLabs / johnsnowlabs

Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.util.start.registerListenerAndStartRefresh. : java.net.SocketTimeoutException: connect timed out #10

Also, the training notebook doesn't run. Here are the traces for the open source notebook: Python Interpreter Error: https://pastebin.com/XiXLxnnT Jupyter Kernel: https://pastebin.com/v7jn0EBr