frankandrobot / rapier

Implements "Relational Learning of Pattern-Match Rules for Information Extraction" by Califf and Mooney
Apache License 2.0
1 stars 1 forks source link

About Rapier implementation #3

Open jfatine opened 7 years ago

jfatine commented 7 years ago

Hi, What's the result of executing the RapierSpec.kt and where to find this result? I want to test Rapier, so i created a new kotlin class and i followed the steps described on README. The problem is that Rapier does not recognize "learnedRules" ! How to fix this please? problem

`import com.frankandrobot.rapier.meta.* import com.frankandrobot.rapier.rapier import com.frankandrobot.rapier.wordTokens import org.jetbrains.spek.api.Spek import com.frankandrobot.rapier.parse.findMatches import com.frankandrobot.rapier.meta.Document import com.frankandrobot.rapier.meta.SlotName

/**

class Example : Spek({

val blankTemplate = BlankTemplate( name = "test", slots = slotNames("speaker") )

val document = Document(""" The Chemical Engineering and Physics departments will host a seminar entitled "Soft Matter in a Tight Spot: Nanorheology of Polymers & Complex Fluids," at 4:30 p.m., Monday, Feb. 27, in Wean Hall 7500. The seminar will be given by Professor Steven Granick from the Materials Science Department at University of Illinois, Urbana-Champaign. """)

val filledTemplate1 = FilledTemplate(slots( SlotName("speaker") to slotFillers( wordTokens("Professor", "Steven", "Granick") ) ))

val filledTemplate2 = FilledTemplate(slots( SlotName("speaker") to slotFillers( wordTokens("Steve", "Granick") ) ))

val exemple1 = Example( blankTemplate, document, filledTemplate1 )

val exemple2 = Example( blankTemplate, speakerdocument, filledTemplate2 )

val params = RapierParams( compressionFails = 7, metricMinPositiveMatches = 1, compressionPriorityQueueSize = 5 )

val examples = Examples(listOf(exemple2, exemple1))

val learnedRules = rapier(blankTemplate, examples = examples, params = params)

val result = learnedRules.findMatches(testdocument)

println(result[SlotName("speaker")]) })

private val speakerdocument = Document(""" Physic Colloquium, Monday, Feb. 27, Steve Granick, University of Illinois, Urbana, "Soft matter in a tight spot: nano-rheology of polymers and complex fluids," 4:30 p.m., 7500 Wean Hall, Coffee at 4:15 p.m. """)

private val testdocument = Document(""" The Center for Cultural Analysis will host a lecture by Richard Maddox entitled "The Best of Possible Islands: Seville, Expo '92, and the Politics of Culture in the 'New Spain'", at 3:30 p.m., Friday, March 17, in Baker Hall 235A. All are welcome. """) `

frankandrobot commented 7 years ago

Ah my bad, the problem is that the README is wrong. Updated the README. You will need to update the latest version of rapier. You can download it from the release page: https://github.com/frankandrobot/rapier/releases/tag/v0.9.1

Let me know if you still have any issues

jfatine commented 7 years ago

Thank you. I downloaded this latest version, but when i run RapierSpeck.kt, i don't get anything as result. The results are the rules learned by rapier?

jfatine commented 7 years ago

I modified the code in class Example based on the new README as follows, but while running i get nothing (see cap1).

`/**
 * Created by Fatine on 07/02/2017.
 */

import com.frankandrobot.rapier.meta.*
import com.frankandrobot.rapier.rapier
import com.frankandrobot.rapier.wordTokens
import org.jetbrains.spek.api.Spek
import com.frankandrobot.rapier.parse.findMatches
import com.frankandrobot.rapier.meta.Document
import com.frankandrobot.rapier.meta.SlotName

class Example : Spek({

    val blankTemplate = BlankTemplate(
            name = "test",
            slots = slotNames("speaker")
    )

//////////////////////////////////////////////doc1

    val speakerDocument1 = Document("""
 The Chemical Engineering and Physics departments will host a seminar
entitled "Soft Matter in a Tight Spot: Nanorheology of Polymers &
Complex Fluids," at 4:30 p.m., Monday, Feb. 27, in Wean Hall 7500.
The seminar will be given by Professor Steven Granick from the Materials
Science Department at University of Illinois, Urbana-Champaign.
 """)

    val filledTemplate1 = FilledTemplate(slots(
            SlotName("speaker") to slotFillers(
                    wordTokens("Professor", "Steven", "Granick")
            )
    ))

    val exemple1 = Example(
            blankTemplate,
            speakerDocument1,
            filledTemplate1
    )
//////////////////////////////////////////doc2

    val filledTemplate2 = FilledTemplate(slots(
            SlotName("speaker") to slotFillers(
                    wordTokens("Steve", "Granick")
            )
    ))

    val exemple2 = Example(
            blankTemplate,
            speakerDocument2,
            filledTemplate2
    )
/////////////////////////////////////////doc3

    val filledTemplate3 = FilledTemplate(slots(
            SlotName("speaker") to slotFillers(
                    wordTokens("Dr.", "Franklyn", "G.", "Prendergast")
            )
    ))

    val exemple3 = Example(
            blankTemplate,
            speakerDocument3,
            filledTemplate2
    )

    val params = RapierParams(
            compressionFails = 7,
            metricMinPositiveMatches = 1,
            compressionPriorityQueueSize = 5
    )

    val examples = Examples(listOf(exemple1, exemple2, exemple3))
    val allLearnedRules = rapier(blankTemplate, examples = examples, params = params)

    val allResults = allLearnedRules()
            .flatMap { learnedRule -> learnedRule.value }
            .findMatches(testDocument)
    println(allResults[SlotName("speaker")])

})

private val speakerDocument2 = Document("""
Physic Colloquium, Monday, Feb. 27,  Steve Granick, University of
Illinois, Urbana, "Soft matter in a tight spot:  nano-rheology of
polymers and complex fluids," 4:30 p.m., 7500 Wean Hall, Coffee at 4:15
p.m.
""")

private val speakerDocument3 = Document("""
Name: Dr. Franklyn G. Prendergast
 Affiliation: Department of Biochemistry and Molecular Biology
 Mayo Foundation, Clinc and Medical School
 Title: "Picosecond Motion in Proteins: Experiments, Analysis,
 Mathematical Simulation, and Interpretations"
 Host/e-mail: Lans Taylor/taylor@a.cfr.cmu.edu
 Date: Wednesday, March 29, 1995
 Time: 3:30 p.m.
 Place: Mellon Institute Conference Room
 Co-Sponsor:Sscience and Technology Center and W.M. Keck Center for
 Advanced Training in Computational Biology
""")

private val testDocument = Document("""
The Center for Cultural Analysis will host a lecture by Richard Maddox
entitled "The Best of Possible Islands: Seville, Expo '92, and the
Politics of Culture in the 'New Spain'", at 3:30 p.m., Friday, March 17,
in Baker Hall 235A.
All are welcome.
""")
`

capture

frankandrobot commented 7 years ago

Couple of things:

  1. If I'm not mistaken, ("Dr.", "Franklyn", "G.", "Prendergast") actually gets mapped to wordTokens("Dr", ".", "Franklyn", "G", ".", "Prendergast") i.e., the period is it's own word token. (To confirm this, do println(speakerDocument3()) --- this will print out the tokens for this document)
  2. you need to tweak the default rapier params. In particular, metricMinPositiveMatches should probably be set to 0. The reason is because with so few examples, a rule won't have enough positive matches to be included.
  3. Also do println(allLearnedRules) to ensure it's learning something.

These are the params I've been using in the tests for small examples:

val params = RapierParams(
        compressionFails = 7,
        metricMinPositiveMatches = 1,
        compressionPriorityQueueSize = 5
      )
jfatine commented 7 years ago

I've just checked, Dr. is consisered as one token. I retested rapier with a very simple case: I used just 2 training documents that contain the expression "will be given by". The test document also contains this expression. Here is what i get ! ` Loading ambiguity classes Loading word clusters Loading tokenizer Loading part-of-speech tagger Loading morphological analyzer

[] Process finished with exit code 0 ` Please, have you an example that works to see the results of extraction? (even the execution of RapierSpek dos not give a result)

frankandrobot commented 7 years ago

The bottom line is I think your use case would benefit from semantic classes (which I haven't yet implemented---so the rule it would find is "match a person" and extract the person from the blurb). Without semantic classes, it is finding rules which are, more or less, specific phrases. In particular, it is learning these 2 rules:

  1. The rule that says "match a proper noun followed by Granick": word: [], tag: [Some<NNP>], semantic: [] word: [Some<Granick>], tag: [Some<NNP>], semantic: []
  2. The rule that says "match three proper nouns followed by the Granick or Prendergast" word: [], tag: [Some<NNP>], semantic: [] word: [], tag: [Some<NNP>], semantic: [] word: [], tag: [Some<NNP>], semantic: [] word: [Some<Granick>, Some<Prendergast>], tag: [Some<NNP>], semantic: []

Below is a modified version of your example that found these rules. Notice two differences:

  1. Fixed the typo in Example3---you were passing FilledTemplate2, not FilledTemplate3
  2. In Example1, we're matching for "Steve Granik", not "Professor Steve Granik". This sort of makes sense because we're trying to find matches of names.

I also tried increasing the number of examples (tried up to 5) but each time the rule would contain a specific name. I also tried using distinct names in each example. As I said before, rapier may give better rules when semantic classes are enabled.

Modified Example

import com.frankandrobot.rapier.meta.*
import com.frankandrobot.rapier.rapier
import com.frankandrobot.rapier.wordTokens
import org.jetbrains.spek.api.Spek
import com.frankandrobot.rapier.parse.findMatches
import com.frankandrobot.rapier.meta.Document
import com.frankandrobot.rapier.meta.SlotName

class ExampleSpec : Spek({

  val blankTemplate = BlankTemplate(
    name = "test",
    slots = slotNames("speaker")
  )
  val speakerDocument1 = Document("""
The Chemical Engineering and Physics departments will host a seminar
entitled "Soft Matter in a Tight Spot: Nanorheology of Polymers &
Complex Fluids," at 4:30 p.m., Monday, Feb. 27, in Wean Hall 7500.
The seminar will be given by Professor Steven Granick from the Materials
Science Department at University of Illinois, Urbana-Champaign.
""")
  val example1 = Example(
    blankTemplate,
    speakerDocument1,
    FilledTemplate(slots(
      SlotName("speaker") to slotFillers(
        wordTokens("Steven", "Granick")
      )
    ))
  )
  val speakerDocument2 = Document("""
Physic Colloquium, Monday, Feb. 27,  Steve Granick, University of
Illinois, Urbana, "Soft matter in a tight spot:  nano-rheology of
polymers and complex fluids," 4:30 p.m., 7500 Wean Hall, Coffee at 4:15
p.m.
""")
  val example2 = Example(
    blankTemplate,
    speakerDocument2,
    FilledTemplate(slots(
      SlotName("speaker") to slotFillers(
        wordTokens("Steve", "Granick")
      )
    ))
  )
  val speakerDocument3 = Document("""
Name: Dr. Franklyn G. Prendergast
 Affiliation: Department of Biochemistry and Molecular Biology
 Mayo Foundation, Clinc and Medical School
 Title: "Picosecond Motion in Proteins: Experiments, Analysis,
 Mathematical Simulation, and Interpretations"
 Host/e-mail: Lans Taylor/taylor@a.cfr.cmu.edu
 Date: Wednesday, March 29, 1995
 Time: 3:30 p.m.
 Place: Mellon Institute Conference Room
 Co-Sponsor:Science and Technology Center and W.M. Keck Center for
 Advanced Training in Computational Biology
""")
  val example3 = Example(
    blankTemplate,
    speakerDocument3,
    FilledTemplate(slots(
      SlotName("speaker") to slotFillers(
        wordTokens("Dr.", "Franklyn", "G.", "Prendergast")
      )
    ))
  )

  val params = RapierParams(
    compressionFails = 7,
    metricMinPositiveMatches = 1,
    compressionPriorityQueueSize = 5
    //maxElementsToSpecialize = 3,
    //ruleSizeWeight = 0.001
  )

  val examples = Examples(listOf(example1, example2, example3))
  val allLearnedRules = rapier(blankTemplate, examples = examples, params = params)

  println(allLearnedRules)
  val allResults = allLearnedRules()
    .flatMap { learnedRule -> learnedRule.value }
    .findMatches(testDocument)
  println(allResults[SlotName("speaker")])

})

private val testDocument = Document("""
The Center for Cultural Analysis will host a lecture by Richard Maddox
entitled "The Best of Possible Islands: Seville, Expo '92, and the
Politics of Culture in the 'New Spain'", at 3:30 p.m., Friday, March 17,
in Baker Hall 235A.
All are welcome.
""")

Learned Rules

LearnedRules(results={SlotName(name=speaker)=[Pattern
  PreFiller:

  Filler:
    word: [], tag: [Some<NNP>], semantic: []
    word: [Some<Granick>], tag: [Some<NNP>], semantic: []
  PostFiller:

, Pattern
  PreFiller:

  Filler:
    word: [], tag: [Some<NNP>], semantic: []
    word: [], tag: [Some<NNP>], semantic: []
    word: [], tag: [Some<NNP>], semantic: []
    word: [Some<Granick>, Some<Prendergast>], tag: [Some<NNP>], semantic: []
  PostFiller:

]})
frankandrobot commented 7 years ago

The reason why it learns these specific rules and not the more general rules that say "match two proper nouns" or "match four proper nouns" is because of the metric. See RuleMetric.kt. The generic rules have too many negative matches---i.e., they find things not specified in the the FilledTemplates. The metric scores these rules as worse than the more specific rules.

Without semantic constraints, this version of rapier is best suited for more "formal" documents, like your Example 3. I bet if all the examples where like that it would find better rules.

jfatine commented 7 years ago

I see. In testDocument, i replaced the word Maddox by Granick, so i get [Richard Granick] as result. The problem is that rapier considers the word Granick as a condition that must exist in the text: thing that is false. The learned rule must not include the word Granick. I think that this problem is related to the Rule Generalization step.

frankandrobot commented 7 years ago

So as I was trying to explain, it's not that the rule generalization is broken---that's how the algorithm works. You don't get more general rules because of the metric.

I also recently spent some work adding support for semantic classes. Under the hood, it uses WordNet which cannot find semantic classes for proper nouns. So it will never generalize "Tom" and "Jeff" to "person". See https://github.com/frankandrobot/rapier/blob/synsets/src/test/kotlin/com/frankandrobot/rapier/nlp/jwi/FindFirstCommonSemanticClassSpec.kt#L50-L57

So even with semantic class support, rapier will likely find similar rules for your use case.

jfatine commented 7 years ago

Okey, thank you Uriel.