UAlbertaALTLab / recording-validation-interface

Maskwacîs recordings validation interface
https://speech-db.altlab.app/
Other
1 stars 1 forks source link

Implement search by semantic domains (WN and/or RW) #267

Open aarppe opened 2 years ago

aarppe commented 2 years ago

We might even consider presenting the semantic category as an alternative to sessions when searching - and on the longer term as actually the more relevant search criterion than any particular recording session. As a follow-up to #121, we should implement a search by semantic domains.

As for implementing this, we fortunately have RapidWords classes for the MD content for Maskwacîs Cree (and WordNet classes for the Cree Words content, which doesn't however entirely overlap with MD). If we have not already read in this information, then we should do so.

The RW classifications that were applied to the original MD content should be pretty straight-forward to parse, whereas the ones for the "new" words and sentences would require parsing the elicitation sheets. In principle that too should be straight-forward, as one could always use the immediately preceding RW class (at whatever level of hierarchy there), but there might be some fuzziness here.

For Tsuut'ina, we have both WN and RW classifications of the Onespot-Sapir (OS) glossary.

For the RW classes, one would want to find anything in that class, or in a more specific class under the search term. That should be relatively straight-forward, as the RW class numbers form a convenient hierarchy, e.g. seeaching for 6.4 (hunt and fish) would have the following subclasses:

    6.4.1 Hunt
    6.4.2 Trap
    6.4.3 Hunting birds
    6.4.4 Beekeeping
    6.4.5 Fishing
    6.4.6 Things done to animals

How to implement this with WordNet would be a different challenge. I'd presume we might with WN choose to show any hyponyms in the decending hierarchy from the search term.

fbanados commented 2 months ago

RapidWords should work for now, but WordNet should still be implemented.

fbanados commented 1 month ago

NLTK package https://www.nltk.org/howto/wordnet.html

fbanados commented 1 month ago

On the search list, add a count for each. Grey the ones that have zero.

fbanados commented 8 hours ago

Count for RapidWords has been added, separate numbers for entries in specific domain and including hyponyms. Categories with zero are greyed out:

Screenshot 2024-09-27 at 12 37 09 PM

We could also choose to not show the domains that have no entries. Let me know if you would prefer that instead.