Open jfatine opened 7 years ago
Ah my bad, the problem is that the README is wrong. Updated the README. You will need to update the latest version of rapier. You can download it from the release page: https://github.com/frankandrobot/rapier/releases/tag/v0.9.1
Let me know if you still have any issues
Thank you. I downloaded this latest version, but when i run RapierSpeck.kt, i don't get anything as result. The results are the rules learned by rapier?
I modified the code in class Example based on the new README as follows, but while running i get nothing (see cap1).
`/**
* Created by Fatine on 07/02/2017.
*/
import com.frankandrobot.rapier.meta.*
import com.frankandrobot.rapier.rapier
import com.frankandrobot.rapier.wordTokens
import org.jetbrains.spek.api.Spek
import com.frankandrobot.rapier.parse.findMatches
import com.frankandrobot.rapier.meta.Document
import com.frankandrobot.rapier.meta.SlotName
class Example : Spek({
val blankTemplate = BlankTemplate(
name = "test",
slots = slotNames("speaker")
)
//////////////////////////////////////////////doc1
val speakerDocument1 = Document("""
The Chemical Engineering and Physics departments will host a seminar
entitled "Soft Matter in a Tight Spot: Nanorheology of Polymers &
Complex Fluids," at 4:30 p.m., Monday, Feb. 27, in Wean Hall 7500.
The seminar will be given by Professor Steven Granick from the Materials
Science Department at University of Illinois, Urbana-Champaign.
""")
val filledTemplate1 = FilledTemplate(slots(
SlotName("speaker") to slotFillers(
wordTokens("Professor", "Steven", "Granick")
)
))
val exemple1 = Example(
blankTemplate,
speakerDocument1,
filledTemplate1
)
//////////////////////////////////////////doc2
val filledTemplate2 = FilledTemplate(slots(
SlotName("speaker") to slotFillers(
wordTokens("Steve", "Granick")
)
))
val exemple2 = Example(
blankTemplate,
speakerDocument2,
filledTemplate2
)
/////////////////////////////////////////doc3
val filledTemplate3 = FilledTemplate(slots(
SlotName("speaker") to slotFillers(
wordTokens("Dr.", "Franklyn", "G.", "Prendergast")
)
))
val exemple3 = Example(
blankTemplate,
speakerDocument3,
filledTemplate2
)
val params = RapierParams(
compressionFails = 7,
metricMinPositiveMatches = 1,
compressionPriorityQueueSize = 5
)
val examples = Examples(listOf(exemple1, exemple2, exemple3))
val allLearnedRules = rapier(blankTemplate, examples = examples, params = params)
val allResults = allLearnedRules()
.flatMap { learnedRule -> learnedRule.value }
.findMatches(testDocument)
println(allResults[SlotName("speaker")])
})
private val speakerDocument2 = Document("""
Physic Colloquium, Monday, Feb. 27, Steve Granick, University of
Illinois, Urbana, "Soft matter in a tight spot: nano-rheology of
polymers and complex fluids," 4:30 p.m., 7500 Wean Hall, Coffee at 4:15
p.m.
""")
private val speakerDocument3 = Document("""
Name: Dr. Franklyn G. Prendergast
Affiliation: Department of Biochemistry and Molecular Biology
Mayo Foundation, Clinc and Medical School
Title: "Picosecond Motion in Proteins: Experiments, Analysis,
Mathematical Simulation, and Interpretations"
Host/e-mail: Lans Taylor/taylor@a.cfr.cmu.edu
Date: Wednesday, March 29, 1995
Time: 3:30 p.m.
Place: Mellon Institute Conference Room
Co-Sponsor:Sscience and Technology Center and W.M. Keck Center for
Advanced Training in Computational Biology
""")
private val testDocument = Document("""
The Center for Cultural Analysis will host a lecture by Richard Maddox
entitled "The Best of Possible Islands: Seville, Expo '92, and the
Politics of Culture in the 'New Spain'", at 3:30 p.m., Friday, March 17,
in Baker Hall 235A.
All are welcome.
""")
`
Couple of things:
("Dr.", "Franklyn", "G.", "Prendergast")
actually gets mapped to wordTokens("Dr", ".", "Franklyn", "G", ".", "Prendergast")
i.e., the period is it's own word token.
(To confirm this, do println(speakerDocument3())
--- this will print out the tokens for this document)metricMinPositiveMatches
should probably be set to 0. The reason is because with so few examples, a rule won't have enough positive matches to be included.println(allLearnedRules)
to ensure it's learning something.These are the params I've been using in the tests for small examples:
val params = RapierParams(
compressionFails = 7,
metricMinPositiveMatches = 1,
compressionPriorityQueueSize = 5
)
I've just checked, Dr. is consisered as one token. I retested rapier with a very simple case: I used just 2 training documents that contain the expression "will be given by". The test document also contains this expression. Here is what i get ! ` Loading ambiguity classes Loading word clusters Loading tokenizer Loading part-of-speech tagger Loading morphological analyzer
[] Process finished with exit code 0 ` Please, have you an example that works to see the results of extraction? (even the execution of RapierSpek dos not give a result)
The bottom line is I think your use case would benefit from semantic classes (which I haven't yet implemented---so the rule it would find is "match a person" and extract the person from the blurb). Without semantic classes, it is finding rules which are, more or less, specific phrases. In particular, it is learning these 2 rules:
word: [], tag: [Some<NNP>], semantic: []
word: [Some<Granick>], tag: [Some<NNP>], semantic: []
word: [], tag: [Some<NNP>], semantic: []
word: [], tag: [Some<NNP>], semantic: []
word: [], tag: [Some<NNP>], semantic: []
word: [Some<Granick>, Some<Prendergast>], tag: [Some<NNP>], semantic: []
Below is a modified version of your example that found these rules. Notice two differences:
I also tried increasing the number of examples (tried up to 5) but each time the rule would contain a specific name. I also tried using distinct names in each example. As I said before, rapier may give better rules when semantic classes are enabled.
import com.frankandrobot.rapier.meta.*
import com.frankandrobot.rapier.rapier
import com.frankandrobot.rapier.wordTokens
import org.jetbrains.spek.api.Spek
import com.frankandrobot.rapier.parse.findMatches
import com.frankandrobot.rapier.meta.Document
import com.frankandrobot.rapier.meta.SlotName
class ExampleSpec : Spek({
val blankTemplate = BlankTemplate(
name = "test",
slots = slotNames("speaker")
)
val speakerDocument1 = Document("""
The Chemical Engineering and Physics departments will host a seminar
entitled "Soft Matter in a Tight Spot: Nanorheology of Polymers &
Complex Fluids," at 4:30 p.m., Monday, Feb. 27, in Wean Hall 7500.
The seminar will be given by Professor Steven Granick from the Materials
Science Department at University of Illinois, Urbana-Champaign.
""")
val example1 = Example(
blankTemplate,
speakerDocument1,
FilledTemplate(slots(
SlotName("speaker") to slotFillers(
wordTokens("Steven", "Granick")
)
))
)
val speakerDocument2 = Document("""
Physic Colloquium, Monday, Feb. 27, Steve Granick, University of
Illinois, Urbana, "Soft matter in a tight spot: nano-rheology of
polymers and complex fluids," 4:30 p.m., 7500 Wean Hall, Coffee at 4:15
p.m.
""")
val example2 = Example(
blankTemplate,
speakerDocument2,
FilledTemplate(slots(
SlotName("speaker") to slotFillers(
wordTokens("Steve", "Granick")
)
))
)
val speakerDocument3 = Document("""
Name: Dr. Franklyn G. Prendergast
Affiliation: Department of Biochemistry and Molecular Biology
Mayo Foundation, Clinc and Medical School
Title: "Picosecond Motion in Proteins: Experiments, Analysis,
Mathematical Simulation, and Interpretations"
Host/e-mail: Lans Taylor/taylor@a.cfr.cmu.edu
Date: Wednesday, March 29, 1995
Time: 3:30 p.m.
Place: Mellon Institute Conference Room
Co-Sponsor:Science and Technology Center and W.M. Keck Center for
Advanced Training in Computational Biology
""")
val example3 = Example(
blankTemplate,
speakerDocument3,
FilledTemplate(slots(
SlotName("speaker") to slotFillers(
wordTokens("Dr.", "Franklyn", "G.", "Prendergast")
)
))
)
val params = RapierParams(
compressionFails = 7,
metricMinPositiveMatches = 1,
compressionPriorityQueueSize = 5
//maxElementsToSpecialize = 3,
//ruleSizeWeight = 0.001
)
val examples = Examples(listOf(example1, example2, example3))
val allLearnedRules = rapier(blankTemplate, examples = examples, params = params)
println(allLearnedRules)
val allResults = allLearnedRules()
.flatMap { learnedRule -> learnedRule.value }
.findMatches(testDocument)
println(allResults[SlotName("speaker")])
})
private val testDocument = Document("""
The Center for Cultural Analysis will host a lecture by Richard Maddox
entitled "The Best of Possible Islands: Seville, Expo '92, and the
Politics of Culture in the 'New Spain'", at 3:30 p.m., Friday, March 17,
in Baker Hall 235A.
All are welcome.
""")
LearnedRules(results={SlotName(name=speaker)=[Pattern
PreFiller:
Filler:
word: [], tag: [Some<NNP>], semantic: []
word: [Some<Granick>], tag: [Some<NNP>], semantic: []
PostFiller:
, Pattern
PreFiller:
Filler:
word: [], tag: [Some<NNP>], semantic: []
word: [], tag: [Some<NNP>], semantic: []
word: [], tag: [Some<NNP>], semantic: []
word: [Some<Granick>, Some<Prendergast>], tag: [Some<NNP>], semantic: []
PostFiller:
]})
The reason why it learns these specific rules and not the more general rules that say "match two proper nouns" or "match four proper nouns" is because of the metric. See RuleMetric.kt. The generic rules have too many negative matches---i.e., they find things not specified in the the FilledTemplates. The metric scores these rules as worse than the more specific rules.
Without semantic constraints, this version of rapier is best suited for more "formal" documents, like your Example 3. I bet if all the examples where like that it would find better rules.
I see. In testDocument, i replaced the word Maddox by Granick, so i get [Richard Granick] as result. The problem is that rapier considers the word Granick as a condition that must exist in the text: thing that is false. The learned rule must not include the word Granick. I think that this problem is related to the Rule Generalization step.
So as I was trying to explain, it's not that the rule generalization is broken---that's how the algorithm works. You don't get more general rules because of the metric.
I also recently spent some work adding support for semantic classes. Under the hood, it uses WordNet which cannot find semantic classes for proper nouns. So it will never generalize "Tom" and "Jeff" to "person". See https://github.com/frankandrobot/rapier/blob/synsets/src/test/kotlin/com/frankandrobot/rapier/nlp/jwi/FindFirstCommonSemanticClassSpec.kt#L50-L57
So even with semantic class support, rapier will likely find similar rules for your use case.
Okey, thank you Uriel.
Hi, What's the result of executing the RapierSpec.kt and where to find this result? I want to test Rapier, so i created a new kotlin class and i followed the steps described on README. The problem is that Rapier does not recognize "learnedRules" ! How to fix this please?
`import com.frankandrobot.rapier.meta.* import com.frankandrobot.rapier.rapier import com.frankandrobot.rapier.wordTokens import org.jetbrains.spek.api.Spek import com.frankandrobot.rapier.parse.findMatches import com.frankandrobot.rapier.meta.Document import com.frankandrobot.rapier.meta.SlotName
/**
class Example : Spek({
val blankTemplate = BlankTemplate( name = "test", slots = slotNames("speaker") )
val document = Document(""" The Chemical Engineering and Physics departments will host a seminar entitled "Soft Matter in a Tight Spot: Nanorheology of Polymers & Complex Fluids," at 4:30 p.m., Monday, Feb. 27, in Wean Hall 7500. The seminar will be given by Professor Steven Granick from the Materials Science Department at University of Illinois, Urbana-Champaign. """)
val filledTemplate1 = FilledTemplate(slots( SlotName("speaker") to slotFillers( wordTokens("Professor", "Steven", "Granick") ) ))
val filledTemplate2 = FilledTemplate(slots( SlotName("speaker") to slotFillers( wordTokens("Steve", "Granick") ) ))
val exemple1 = Example( blankTemplate, document, filledTemplate1 )
val exemple2 = Example( blankTemplate, speakerdocument, filledTemplate2 )
val params = RapierParams( compressionFails = 7, metricMinPositiveMatches = 1, compressionPriorityQueueSize = 5 )
val examples = Examples(listOf(exemple2, exemple1))
val learnedRules = rapier(blankTemplate, examples = examples, params = params)
val result = learnedRules.findMatches(testdocument)
println(result[SlotName("speaker")]) })
private val speakerdocument = Document(""" Physic Colloquium, Monday, Feb. 27, Steve Granick, University of Illinois, Urbana, "Soft matter in a tight spot: nano-rheology of polymers and complex fluids," 4:30 p.m., 7500 Wean Hall, Coffee at 4:15 p.m. """)
private val testdocument = Document(""" The Center for Cultural Analysis will host a lecture by Richard Maddox entitled "The Best of Possible Islands: Seville, Expo '92, and the Politics of Culture in the 'New Spain'", at 3:30 p.m., Friday, March 17, in Baker Hall 235A. All are welcome. """) `