Closed swilli6 closed 2 years ago
I will try to work out what is wrong, but it would be useful if you could look over that piece of code (if you have time) - perhaps you can see already what's wrong? Or else, which parts raise questions? Any insight would be useful.
I cannot get the KWIC feature to work when I run the search engine. Search results are still just presented as full documents for some reason. The code is identical to the code in the repository, so I have no idea why this is
It also seems that instead of showing the keywords in context, the search is just producing duplicates for me. For example, when searching "bones" which should only bring up "Fix you" by Coldplay (= 1 match), the results are 10 documents that are all the same:
Updated the code following advice from Yves. It searches for each word in the searchlist through each of the documents. If it finds something, it appends a string (artist, song name, part of lyrics) to matches, which is a list of strings for now (I tried to make tuple, but had no idea how to get it to print out the results; so this option is temporary). Some other comments:
Found the mistake why I couldn't print the tuple before. Now the results look better on the HTML page.
This sounds great! I will have a look at what the code produces on my laptop later on. I guess all of the issues aren't solved yet but hopefully this is progress
Yeah I just ran the current code and it definitely looks better now, and the KWIC seems to work for me now too! I searched 'cold' and I'm not sure why it finds 'cold' and 'coldplay' but only displays 'Michael Jackson's Thriller' though
oh no...... but I think I know where the problem is. the matching docs show occurrences not only in lyrics, but in artist and song name; essentially, in all the keys in tfv5.vocabulary_ . And the "found words" is based on that too. But we get the results searching for the queries only in the lyrics (excluding the artist and song name). I hope that makes sense. It feels like the whole piece of code for the results should then be rewritten, but I'm out of ideas :(((
I see what you mean! So even though 'coldplay' matches a number of documents, the printed out results will only include hits in the actual lyrics. I'll have a look at the code to see if I can do anything about this
Looking at the code and this issue, I'm wondering if we can add some kind of conditional that checks if the query matches an artist name
I think we may need to make a decision on whether we allow the user to search artist or song names at all because this may be too hard to execute
Hmm, I would vote for searching just the lyrics - seems to be more variability there. But then another question - should we put in some query restrictions (e.g. printing sorry, no results, if the query is only an artist/song name and found nowhere in the lyrics)?
kwic works, so closing the issue
I implemented some version of printing results of search query with a keyword in context. At the moment, it uses the string slicing technique. I did write code that would split the sentences where searched words occur into words and then slice on the words, but changed it (for simplicity's sake) to slicing on characters (without splitting the results). This isn't yet perfect: