Alina-enni / lingdiggers

Project for the Building NLP Applications course
0 stars 0 forks source link

KWIC (keyword in context) view in search results #22

Closed swilli6 closed 2 years ago

miglamigla commented 2 years ago

I implemented some version of printing results of search query with a keyword in context. At the moment, it uses the string slicing technique. I did write code that would split the sentences where searched words occur into words and then slice on the words, but changed it (for simplicity's sake) to slicing on characters (without splitting the results). This isn't yet perfect:

miglamigla commented 2 years ago

I will try to work out what is wrong, but it would be useful if you could look over that piece of code (if you have time) - perhaps you can see already what's wrong? Or else, which parts raise questions? Any insight would be useful.

Alina-enni commented 2 years ago

I cannot get the KWIC feature to work when I run the search engine. Search results are still just presented as full documents for some reason. The code is identical to the code in the repository, so I have no idea why this is

Alina-enni commented 2 years ago

It also seems that instead of showing the keywords in context, the search is just producing duplicates for me. For example, when searching "bones" which should only bring up "Fix you" by Coldplay (= 1 match), the results are 10 documents that are all the same:

Screenshot 2022-02-24 at 21 26 19
miglamigla commented 2 years ago

Updated the code following advice from Yves. It searches for each word in the searchlist through each of the documents. If it finds something, it appends a string (artist, song name, part of lyrics) to matches, which is a list of strings for now (I tried to make tuple, but had no idea how to get it to print out the results; so this option is temporary). Some other comments:

miglamigla commented 2 years ago

Found the mistake why I couldn't print the tuple before. Now the results look better on the HTML page.

Alina-enni commented 2 years ago

This sounds great! I will have a look at what the code produces on my laptop later on. I guess all of the issues aren't solved yet but hopefully this is progress

Alina-enni commented 2 years ago

Yeah I just ran the current code and it definitely looks better now, and the KWIC seems to work for me now too! I searched 'cold' and I'm not sure why it finds 'cold' and 'coldplay' but only displays 'Michael Jackson's Thriller' though

Screenshot 2022-03-02 at 13 26 55
miglamigla commented 2 years ago

oh no...... but I think I know where the problem is. the matching docs show occurrences not only in lyrics, but in artist and song name; essentially, in all the keys in tfv5.vocabulary_ . And the "found words" is based on that too. But we get the results searching for the queries only in the lyrics (excluding the artist and song name). I hope that makes sense. It feels like the whole piece of code for the results should then be rewritten, but I'm out of ideas :(((

Alina-enni commented 2 years ago

I see what you mean! So even though 'coldplay' matches a number of documents, the printed out results will only include hits in the actual lyrics. I'll have a look at the code to see if I can do anything about this

Alina-enni commented 2 years ago

Looking at the code and this issue, I'm wondering if we can add some kind of conditional that checks if the query matches an artist name

Alina-enni commented 2 years ago

I think we may need to make a decision on whether we allow the user to search artist or song names at all because this may be too hard to execute

miglamigla commented 2 years ago

Hmm, I would vote for searching just the lyrics - seems to be more variability there. But then another question - should we put in some query restrictions (e.g. printing sorry, no results, if the query is only an artist/song name and found nowhere in the lyrics)?

miglamigla commented 2 years ago

kwic works, so closing the issue