dkpro / dkpro-similarity

Word and text similarity measures
https://dkpro.github.io/dkpro-similarity
Other
53 stars 22 forks source link

Example not working #56

Closed omidb closed 8 years ago

omidb commented 8 years ago

Hi,

I'm trying to use your example in the website:

import dkpro.similarity.algorithms.api.TextSimilarityMeasure
import dkpro.similarity.algorithms.lexical.ngrams.WordNGramJaccardMeasure

object Playground extends App{

  // this similarity measure is defined in the dkpro.similarity.algorithms.lexical-asl package
  // you need to add that to your .pom to make that example work
  // there are some examples that should work out of the box in dkpro.similarity.example-gpl
  val measure:TextSimilarityMeasure = new WordNGramJaccardMeasure(3);    // Use word trigrams

  val tokens1 = "This is a short example text.".split(" ");
  val tokens2 = "A short example text could look like that.".split(" ");

  // only works from 2.1.0-SHAPSHOT onwards, for previous versions you need to convert to Collection<String> first
  val score = measure.getSimilarity(tokens1, tokens2);

  System.out.println("Similarity: " + score);

}

The return value is 0.0, am I missing something?

reckart commented 8 years ago

After a quick glance at the WordNGramJaccardMeasure, I'd say that it returns zero because there is no common 3-gram between the two sentences you are using as examples.

reckart commented 8 years ago

ah, there is one... sorry. "short example text"

reckart commented 8 years ago

No, I was right ;) There is no common ngram. If you split the first sentence, you get text. (with a trailing full stop) but in the second sentence, you get text.

reckart commented 8 years ago

@omidb Maybe the example needs to be fixed. At which URL did you find it?

omidb commented 8 years ago

I found it here: https://dkpro.github.io/dkpro-similarity/

What are the similarity measures that you found work better.

reckart commented 8 years ago

I updated the example code.

There is no way of saying in general which measure works best. It depends on the context of your. For questions, best try the users mailing list: https://groups.google.com/forum/#!forum/dkpro-similarity-users

Farbod29 commented 5 years ago

Hey Omid its Farbod from Germany,

hope you are well last time we saw each other was Hamkafe Bargh MRL Nao, QIAU 💃 I am using Dkpro also, but I don't know about the parameter. trigrams TextSimilarityMeasure measure = new WordNGramJaccardMeasure(3); // Use word **trigrams**

Also, want to ask you if you know if this is the semantic similarity _ mean it uses ESA to each ten in Wikipedia to extract semantic or not. by the way, I could run it there new maven 2.3

thanks, bro