Open fedenanni opened 3 years ago
Hi!
We were talking the other day about the fact that, in typical WSD tasks, the "most frequent sense" baseline is a very tough baseline and very difficult to beat, and that we should have a similarly strong baseline for our experiment. I have just seen there's a column in the dataframe with main_current_sense
, could we use it for this?
For machine, the "main_current_sense" is:
A complex device, consisting of a number of interrelated parts, each having a definite function, together applying, using, or generating mechanical or (later) electrical power to perform a certain kind of work (often specified by a preceding verbal noun).
Ah, but:
Note that this feature is experimental and heuristic: we do not yet have positive identification of the main sense for all multi-sense words in the OED. (from: https://languages.oup.com/research/oed-researcher-api/)
Hi!
We were talking the other day about the fact that, in typical WSD tasks, the "most frequent sense" baseline is a very tough baseline and very difficult to beat, and that we should have a similarly strong baseline for our experiment. I have just seen there's a column in the dataframe with
main_current_sense
, could we use it for this?For machine, the "main_current_sense" is:
A complex device, consisting of a number of interrelated parts, each having a definite function, together applying, using, or generating mechanical or (later) electrical power to perform a certain kind of work (often specified by a preceding verbal noun).
Hi! Yes, this is definitely something we can use as a baseline when applying WSD to other corpora. I didn't spot this column. Does the "current" apply to "now" (i.e. is it boolean or does it have a date range)?
Yes, current
applies to now
(it is a boolean):
If 'true', restrict results to senses which constitute the main current sense of a word. (Note that this feature is experimental and heuristic: we do not yet have positive identification of the main sense for all multi-sense words in the OED.)
There is the meta
column, which has a position_in_entry
field that may contain info we can also use for this:
{'created': 1904,
'revised': True,
'updated': 2000,
'sense_group': 'machine_nn01-g08',
'position_in_entry': 22}
@GiorgiatolfoBL you and I could work on this together if you want - I can sketch the initial idea here below, you could take care of the implementation and I'll support there