Alina-enni / lingdiggers

Project for the Building NLP Applications course
0 stars 0 forks source link

Test theme extraction #24

Closed swilli6 closed 2 years ago

swilli6 commented 2 years ago

Let's see if it's a useful feature for our project

Alina-enni commented 2 years ago

I was hoping to get started with this after I got the multi-word search and the wildcard search working pretty quickly. However, I am having trouble installing pke and the nltk resources it requires, so I don't know if I'll be able to work on this. I can't find a way to install the nltk stopwords, the universal tags, and the English language model on a Mac for some reason :/

Alina-enni commented 2 years ago
Screenshot 2022-02-17 at 14 56 21

It behaves almost like the resources don't exist at all

Alina-enni commented 2 years ago

Oh boy it took a lot of wrangling and desperation, but I managed to get it working! I found a solution for installing the resources form nltk and spacy. I used the current version of lyrics2.txt in our repo as a document to extract themes from and this is the output:

Screenshot 2022-02-17 at 15 24 59

I have no idea what 'cornflake girl lyrics' refers to xD

miglamigla commented 2 years ago

Apparently, a song by Tori Amos :D I wonder, what the sah is Anyhow, that looks really interesting - especially if we had even more songs in the data file. Maybe we could show these themes on the initial search page (before searching for anything)?

Alina-enni commented 2 years ago

Yeah I think that is a good idea! Let's test out the theme extraction for a larger index and see if the results seem at all useful or sensible.