Open shopuz opened 7 years ago
I'm wondering this too!
It would be an additional user defined function in the function file or whatever file you're working in (as long as you have all of the necessary import statements).
def parse = udf { sentence: String =>
new Sentence(sentence).parse().asScala.map(_.toString).mkString(" ")
}
and you would use it as
val input = Seq(
(1, "Stanford is located in California. There are sometimes mountain lions on campus.")
).toDF("id", "quote")
val output = input.select(col("quote"), explode(ssplit(col("quote"))).as("sent")).select(col("quote"), col("sent"), parse(col("sent")).as("parse"))
output.show()
(Edited this comment to be more correct after I played around with it in spark-shell.)
Thanks @lucy3 I tried and run bellow code in sparkshell, the output is a little better:
import java.util.Properties
import scala.collection.JavaConverters._
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, CleanXmlAnnotator, StanfordCoreNLP, TokenizerAnnotator}
import edu.stanford.nlp.pipeline.CoreNLPProtos.Sentiment
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import edu.stanford.nlp.simple.{Document, Sentence}
import edu.stanford.nlp.util.Quadruple
import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions._
import com.databricks.spark.corenlp.functions._
def parse = udf { sentence: String =>
new Sentence(sentence).parse().pennString().replace("\n", "")
}
and similar to @lucy3 it can be used as:
val input = Seq(
(1, "Stanford is located in California. There are sometimes mountain lions on campus.")
).toDF("id", "quote")
val output = input.select(col("quote"), explode(ssplit(col("quote"))).as("sent")).select(col("quote"), col("sent"), parse(col("sent")).as("parse"))
output.show()
I can see that there is a function defined for dependency parsing
depparse
. However I can't see if Constituency Parsingparse
in the list of functions. Is there any way I can get the constituency parsing ?