Open glenl opened 8 years ago
Yes, likely related to https://github.com/MutopiaProject/MutopiaProject/issues/553 but maybe a bit worse.
If I recall correctly, the old website was able to find the piece when diacriticals were included in search term. At that time, I was updating "Für Elise", and added "Fur Elise" as keyword to "more-info" field, such that the piece could be found with either spelling. I get right result now for "fur elise", but no hit with the diacritical.
Ugh. I just edited the CGI routine to use collation code and the performance is not acceptable.
A few things:
Let's move forward and create some requirements for making this happen correctly (using Für Elise as the example):
The perl Collate routines work in my tests. They don't do it fast so now it is about performance tweaking and I have already done some basic things. I have confidence I can make it somewhat faster but not as fast as basic text matching. Here are some ideas and I'll work on the first while you the other two are considered:
Would it be an option, pre-processing the search target datafile, such that it is stripped of diacriticals and shifted to lowercase beforehand? User-submited search keywords also cleared of diacriticals and shifted to lowercase, before launching the search against the target datafile, to identify matching piece-IDs?
We have a certain amount of looseness in our search cache, right? It is not the archive, it is a data set that is used to find references within our archive. If I understand you correctly, you would be making our search cache a true keyword search engine --- the cache is built so that it is free of diacriticals, then search input is stripped of diacriticals, so simple pattern matching can be done. I am guessing we will find some holes but it would not be difficult to model and test.
Yes, that's what I meant. One hole I can think of is the cache being also leveraged as data source when reporting search results. That would require post-processing to pick up the untransformed fields.
@glenl, if the remaining work on this issue is same as was raised in https://github.com/MutopiaProject/MutopiaProject/issues/553, are you OK closing the tracker in MutopiaProject, and keep this one open to document future progress?
This is moved here from issue #77 by @dominicus.
From the home page, do a keyword search for "Pathétique". I get unrelated results. Go to "Advanced Search", search for keyword="Pathétique" with no other filters. I get no results. Yet we should get at least one hit http://www.mutopiaproject.org/cgibin/piece-info.cgi?id=299