Living-with-machines / TargetedSenseDisambiguation

Repository for the work on Targeted Sense Disambiguation
MIT License
1 stars 0 forks source link

Setup unsupervised baselines using temporal information from metadata #48

Open fedenanni opened 3 years ago

fedenanni commented 3 years ago

@GiorgiatolfoBL you and I could work on this together if you want - I can sketch the initial idea here below, you could take care of the implementation and I'll support there

mcollardanuy commented 3 years ago

Hi!

We were talking the other day about the fact that, in typical WSD tasks, the "most frequent sense" baseline is a very tough baseline and very difficult to beat, and that we should have a similarly strong baseline for our experiment. I have just seen there's a column in the dataframe with main_current_sense, could we use it for this?

For machine, the "main_current_sense" is:

A complex device, consisting of a number of interrelated parts, each having a definite function, together applying, using, or generating mechanical or (later) electrical power to perform a certain kind of work (often specified by a preceding verbal noun).

mcollardanuy commented 3 years ago

Ah, but:

Note that this feature is experimental and heuristic: we do not yet have positive identification of the main sense for all multi-sense words in the OED. (from: https://languages.oup.com/research/oed-researcher-api/)

kasparvonbeelen commented 3 years ago

Hi!

We were talking the other day about the fact that, in typical WSD tasks, the "most frequent sense" baseline is a very tough baseline and very difficult to beat, and that we should have a similarly strong baseline for our experiment. I have just seen there's a column in the dataframe with main_current_sense, could we use it for this?

For machine, the "main_current_sense" is:

A complex device, consisting of a number of interrelated parts, each having a definite function, together applying, using, or generating mechanical or (later) electrical power to perform a certain kind of work (often specified by a preceding verbal noun).

Hi! Yes, this is definitely something we can use as a baseline when applying WSD to other corpora. I didn't spot this column. Does the "current" apply to "now" (i.e. is it boolean or does it have a date range)?

mcollardanuy commented 3 years ago

Yes, current applies to now (it is a boolean):

If 'true', restrict results to senses which constitute the main current sense of a word. (Note that this feature is experimental and heuristic: we do not yet have positive identification of the main sense for all multi-sense words in the OED.)

There is the meta column, which has a position_in_entry field that may contain info we can also use for this:

{'created': 1904,
 'revised': True,
 'updated': 2000,
 'sense_group': 'machine_nn01-g08',
 'position_in_entry': 22}
fedenanni commented 3 years ago

Hi all - yes the most frequent sense is super strong (see also here), let's see if the "current" could work