clemEssien / spark-nlp-phi-annotator

Apache License 2.0
1 stars 0 forks source link

Set up Spark NLP #8

Open clemEssien opened 3 years ago

clemEssien commented 3 years ago

Steps to install NLPSpark library

Requirements & Setup

  1. Java 8
  2. ssh server
  3. Apache Spark 3.1.x (or 3.0.x, or 2.4.x, or 2.3.x)
  4. spark-nlp

Run the following commands to install Java

  1. sudo apt-get update
  2. sudo apt-get install openjdk-8-jdk
  3. export JAVA_HOME=path_to_java_home
  4. java -version This should return something like this: openjdk version "1.8.0_242" OpenJDK Runtime Environment (build 1.8.0_242-b09) OpenJDK 64-Bit Server VM (build 25.242-b09, mixed mode)

To install ssh server

If ssh is already installed and enabled, skip this step or else run the following commands;

To install Apcahe Spark

  1. wget https://downloads.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz
  2. tar xvf spark-*
  3. sudo mv spark-3.0.1-bin-hadoop2.7/* /opt/spark
  4. nano ~/.barsh (add the following lines below)
  5. echo export SPARK_HOME=/opt/spark echo export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin echo export PYSPARK_PYTHON=/usr/bin/python3
  6. To verify that this install correctly, run the following
    • start-master.sh
    • start-slave.sh spark://ubuntu1:7077
    • open the following link in your browser: http://127.0.0.1:8080/
    • Then you can kill the process

Register for spark nlp trial license

To install NLP Spark

  1. run conda install -c johnsnowlabs spark-nlp

  2. Register for spark nlp jsl license at https://nlp.johnsnowlabs.com/docs/en/licensed_install

  3. Run the following command: pip install -q spark-nlp-jsl==${version} --extra-index-url https://pypi.johnsnowlabs.com/${secret.code} --upgrade

  4. run spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.1.0 to Load Spark NLP with pyspark

Spark NLP Models

If you prefer to use the model online, then discard this step, but if offline is the preferred option, then download the following from https://github.com/JohnSnowLabs/spark-nlp-models.

  1. embeddings_clinical
  2. ner_deid_large
  3. sentence_detector_dl_healthcare
  4. embeddings_clinical Create a folder called nlp_models in the server directory and place the downloaded models there. i.e. :

   aws s3 cp s3://auxdata.johnsnowlabs.com/clinical/models/ner_deid_large_en_2.5.3_2.4_1595427435246.zip
   aws s3 cp s3://auxdata.johnsnowlabs.com/clinical/models/sentence_detector_dl_healthcare_en_2.6.0_2.4_1600001082565.zip
   aws s3 cp s3://auxdata.johnsnowlabs.com/clinical/models/embeddings_clinical_en_2.4.0_2.4_1580237286004.zip```
tschaffter commented 3 years ago

The aws commands are missing the target path.