commonknowledge / banmarchive

The Amiel Melburn Archive - an online database of socialist and radical writings
https://banmarchive.org.uk/
1 stars 0 forks source link

Keyword search not fully functional #24

Open lydiahughes opened 2 years ago

lydiahughes commented 2 years ago

Email from researcher: There is also something not right with the keyword search compared to the old site. I just searched for the term ‘capitalism’ in Black Dwarf and it brings up no results! For 7 Days it brings up only sections headed ‘Capitalism’.

lydiahughes commented 2 years ago

The second part of this "For 7 Days it brings up only sections headed ‘Capitalism’." is no longer true, but it is still true that Black Dwarf shows no results, not only for Capitalism but for any search term. The Search function does not work for Black Dwarf

chrisdevereux commented 2 years ago

Hi Hanna – the immediate reason for this is that none of the black dwarf issues are tagged with 'capitalism' as a keyword.

This has happened for a couple of reasons:

  1. The keywords associated with articles are automatically generated using an algorithm which looks for words and phrases that occur more frequently in that article than in other articles.

    We did this because we didn't have a list of keywords associated with each article and wanted to populate the articles with a 'good enough' set of keywords at first. The idea was always that the keywords would need to be edited by hand as the algorithm is very far from perfect – it works for well for words that are less evenly distributed (like 'guevara'), but not for words that show up pretty much everywhere (like 'capitalism')

  2. What we mean by keywords is a little different to the old archive - from using the old archive, it seemed to treat keywords as phrases that appeared in the article. In the new archive, keywords are better understood as more like categories or topics, although it also gives people the option of searching by phrase.

    We have two ways of searching articles – by keyword (which is the default in advanced search) or by literal phrase (which happens in the basic search and is an option in advanced search). If you use the 'incluides phrase' option to search black dwarf, you'll see a much bigger set of results. Eg: https://banmarchive.org.uk/search/?mode=advanced&publication=188&decade=&author=&bools=AND&ops=phrase&values=capitalism

What can we do about it? I can see a few other options (presented in order of how much work they would be). Am open to suggestions from you as well!

lydiahughes commented 2 years ago

Hanna: Hi chris, thanks, this is really useful to understand. I think my order of preference would be your option 2 first, trying to change the algorithm to see if it helps making the search better. And then second, if that didn't significantly improve results, then your option 1, making includes phrase the default. Thanks