Open assaft opened 8 years ago
The expected events are wrong, that's an easy fix. However, "rains" and "caused" are identified as events, but "flooding" is not.
The TLink is the difficult part. As I said before, the relation between the events stems from the meaning of the verb, and not any syntax or presence of prepositions.
"flooding" may not be identified because it does not appear in th training data. I assume you are using the machine learning based event classifier trained on TimeBank.
There should be a rule-based event classifier Class which I think currently calls every verb an event (which may not be conservative enough depending on your application). However, it should be possible to add a more conservative rule-based event classifier based on a gazetteer.
Regarding the TLink, can you be specific as to which event pair is of concern?
The link in question is the one between the events (there are no timexes in this sentence). The relation in this case should be BEGUN_BY, but in a similar sentence "The drought stopped the flooding" it should be ENDED_BY, even though the grammatical structure is exactly the same. The difference is in the meaning of the verb.
This is how TimeML annotators analyzed this sentence (the rains caused the flooding):
ei1,rains,occurrence ei2,caused,occurrence ei3,flooding,occurrence l1,ei1,ei3,BEFORE l2,ei1,ei2,IDENTITY
This analysis reflects a timeline as follows: In words: there was an occurrence of rains and an occurrence of causation, which are the same, followed by an occurrence of flooding.
I would argue that a more natural reading for this sentence is: (1) it was raining, and meanwhile the rains started to cause a flooding. The causation starts right when the flooding starts, and the flooding stays after the rains and the causation have ended. One may also argue for: (2) This corresponds to an interpretation in which the rains ended at some point, and after a while the water aggregated and caused a flooding.
In any case, I think any natural reading of this sentence would treat the events as enduring ones, i.e. states, rather than instantaneous ones, i.e. occurrences. Thus: ei1,rains,state ei2,caused,state ei3,flooding,state
And in both (1) and (2) we should have: l2,ei2,ei3,BEGINS / ei3,ei2,BEGUN_BY
What distinguishes between (1) and (2) is: in (1) we have: l1,ei2,ei1,ENDS / ei1,ei2,ENDED_BY in (2) we have: l1,ei2,ei1,AFTER / ei1,ei2,BEFORE
To conclude, I'd argue that TimeML's annotation is off here, and both (1) and (2) are legit gold annotations.
Yuval - regarding the tlink relation being dependent on the semantics of the verb - that's perfectly correct, but Caevo is already dealing with such cases. For example: John ate before leaving vs. John left before eating. Some of Caevo's rules are dealing with particular lexical items (e.g. before and after) that imply particular relations. So similarly,
A phrase like X stopped/finished/ended
Recognizing the flooding is an event is also expected from the current functionality of Caevo.
On the other hand, identifying that the events are enduring and not instantaneous is something I see beyond the scope of the current version of Caevo. I think that an external resource / tool is needed in order to make the right classification.
Currently, CAEVO makes limited use of WordNet. Perhaps it would be wise to have future versions of CAEVO make smarter use of DIRT or some other knowledge source to identify these syntactic structures and the specific verbs, so that making the event classification and link relation decisions will be easier and more precise.
t15's text is: The rains caused the flooding.
t15's exp is: ei1,attack ei2,fires l1,ei2,ei1,BEGUN_BY