Closed janahoffmann closed 12 years ago
Even characters which look (nearly) the same may differ in the number by which they are encoded. Here is an example (copied from #1 to here) of a related bug in simple search:
searching for 'Küsten' ( http://test111.ait.co.at/?q=advanced_search_view/{!lucene}K%C3%BCsten ) should return an item with title "Über die Ökologie und Verbreitung der Arthropoden der Triebsandgebiete an den Küsten Finnlands". This does not work when the term 'Küsten' is typed in, however it works when copy/pasting the term from the portal page http://test111.ait.co.at/?q=node/21991
compare the following searches:
copy paste: http://test111.ait.co.at/?q=advanced_search_view/{!lucene}Ku%CC%88sten typed in: http://test111.ait.co.at/?q=advanced_search_view/{!lucene}K%C3%BCsten
functional test in simple search possible, also include comment above
Tested by terms: Schlüter and Schluter brings the same results. Schlueter has no results.
Bücher and Bucher have the same results including both terms. Buecher has no results.
A result for Künste includes mainly kunst.
Küsten: in this case is problem in metadata, because the term Küsten on the website is not written in metadata properly but it is probably problem of OCR. Same problems are on BHL portal, especially by old books. It looks similar, but when you export or edit the metadata, the term is different. Copy past tactic is just copy the wrong term from the metadata to the searching option. It is actually not a bug of the portal but of the metadata them self. This function mainly depends on the quality of metadata and OCR.
Löwe: The results are mainly for low and even many doesn’t correspond with searching term or his part. The same results are for term Loewe.
Conclusion: Search with words including diacritic characters is not working properly. Problem could be caused partly by quality of metadata but also by bug in search option.
Bug: searching by terms ae, oe, ue, etc. is not working. The results for example for Löwe includes many different records included just parts of this word or even many of them doesn’t correspond with searching term.
Testing of feature 1.1.4 - Search with words including diacritic characters
Feature description from the Catalogue of user requirements:
Search term with ä, ö, ü, ç, à, etc.: find also ae, oe, ue, etc.
Search term - Küsten, Löwe, etc. - Find Küsten, Kuesten and Löwe, Loewe
Display correctly in result list.
Testers: @AntonioGVH @fwelter @heimor @janahoffmann @JFTester @JiriFrank @LarissaS
Testing of feature 1.1.4 - Search with words including diacritic characters
Feature description from the Catalogue of user requirements:
Search term with ä, ö, ü, ç, à, etc.: find also ae, oe, ue, etc.
Search term - Küsten, Löwe, etc. - Find Küsten, Kuesten and Löwe, Loewe
Display correctly in result list.
Testers: @fwelter
Testing of feature 1.1.4 - Search with words including diacritic characters
Test 1: searching for “Löwe” – 186 records,
Very difficult to see in the detail description of record where matching is (because no highlight) and if it is for “Löwe” or also for “low”?
Searching for “Loewe” - 28 records
I don't think that searching with ö find also oe (as in the requirements). I checked it with Löwe AND Aartsen - 0 results, while Loewe AND Aartsen gives 4 results (records from Naturalis)
Searching for “Vertébrés” – 47 records, 2 French texts from RMCA (again doubles!) and 45 from Naturalis, English or Dutch texts, but “eng” or "dut" does not appear in the left column!
Searching for “vertebrae” – 14 records Searching for “Lofoï” and “Lofoi” give same results – 2 records, Searching for “Algérie” gives 5 records both for Algérie (fre) and Algerie (ger)
Searching for "tête" and "tete" give same result - 13 records, all French texts.
Yesterday this error appeared then I logged out and was continuing the simple search: Notice: Undefined variable: _SESSION in advanced_search_view() (line 485 of /var/www/drupal/sites/all/modules/ait/advanced_search/advanced_search.module). Warning: array_key_exists() expects parameter 2 to be array, null given in advanced_search_view() (line 485 of /var/www/drupal/sites/all/modules/ait/advanced_search/advanced_search.module).
-----Original Message----- From: JiriFrank [mailto:reply@reply.github.com] Sent: 1 августа 2011 г. 10:19 To: Smirnova Larissa Subject: Re: [bhle] 1.1.4 - Search with words including diacritic characters (#36)
Testing of feature 1.1.4 - Search with words including diacritic characters
Feature description from the Catalogue of user requirements:
Search term with ä, ö, ü, ç, à, etc.: find also ae, oe, ue, etc.
Search term - Küsten, Löwe, etc. - Find Küsten, Kuesten and Löwe, Loewe
Display correctly in result list.
Testers: @AntonioGVH @fwelter @heimor @janahoffmann @JFTester @JiriFrank @LarissaS
Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-1698708 ###########################################
This message has been scanned by ICT - Africa Museum
9/8/2011 - Filtered through antispam by ICT
Queriing by 'Kürten' gives 3 results Queriing by 'Kusten' gives the same 3 results. Queriing by 'Kuesten' gives NO result.
Queriing by 'Löwe' gives results with 'low' and 'lowest', but not 'Löwe' (as far as I have looked for) Queriing by 'Loewe' gives results with 'Loew'.
I am including several screen shots.
Antonio
Quoting JiriFrank:
Testing of feature 1.1.4 - Search with words including diacritic characters
Feature description from the Catalogue of user requirements:
Search term with ä, ö, ü, ç, à, etc.: find also ae, oe, ue, etc.
Search term - Küsten, Löwe, etc. - Find Küsten, Kuesten and Löwe, Loewe
Display correctly in result list.
Testers: @AntonioGVH @fwelter @heimor @janahoffmann @JFTester @JiriFrank @LarissaS
Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-1698708
1.1.4 diacritic marks
voyages îles
mantides musée
"Musée"
"Physeter Macrocephalus Linnaeus"
"Physeter Macrocephalus Linnæus"
"ueber die placenta"
"über die placenta"
"uber die placenta""
"ber die placenta"
oesterreich
osterreich
österreich
diacritic marks are ignored in searching and finding. seems to work fine. oe is not recognised as ö. ß is recognised as ss.
1 - the result lists generally displayed titles where none of the words were contained in the title, author or year, but in the text of the abstract and notes of the item page.
2 - single journal volumes were displayed for serial runs, they blocked all intelligent search attempts and spammed the results list with over and over the same journal title.
Francisco
F. Welter-Schultes Zoologisches Institut, Berliner Str. 28, D-37073 Goettingen Phone +49 551 395536, Fax +49 551 395579 http://www.gwdg.de/~fwelter http://www.animalbase.org
@akohlbecker Please check whether the new description is accurate for the indicated problem. Do we get all diacritics with a Unicode translation? What are the preconditions for the features? I need this for the CoRv1. Thanks
@janahoffmann Yes the new description correctly describes the problem
this issue should fixed see #176, however I still need to connect the portal with gsearch and we need additional test data to be ingested @hengdi could you please "ingest" the following book? http://bhl-portal-dev.nhm.ac.uk/?q=node/21991&language=es Über die Ökologie und Verbreitung der Arthropoden der Triebsandgebiete an den Küsten Finnlands Über die Jährlichen Zuwachszonen der Schuppen und Beziehungen zwischen Sommertemperatur und Zuwachs bei Abramis brama Zur Biologie von Regulus r. regulus (L.) und Parus atricapillus borealis selys
@akohlbecker yes, ok, could you please figure me out where are the original pages and files of this book? Thanks.
@hengdi sorry I got no idea where to find this book. I wrote @melitabirthaelmer if she knows about it, but she is out of office today, maybe Lee or Chris know something?
Here are the original files for the book (http://bhl-portal-dev.nhm.ac.uk/?q=node/21991&language=es) :
http://bhl-celsus.nhm.ac.uk/uploads/UHViikki/FI-Hb/FI-Hb_299947_acta_zoologica/299947_12-14_1932/
OK, this book is in the Fedora Repository now.
this is now fixed, closing issue
1.1.4 - Search with words including diacritic characters (Simple Search)
COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple search
Description:
Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.
Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.
@AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH
This is not working perfectly. In some examples it does, in others not. When I look for "Aerzte" for example, I always get one result (Medicinisch-pharmaceutische botanik zugleich als handbuch der systematischen botanik fur botaniker, arzte und apotheker) but not this one (Deutsche Flora. Pharmaceutisch-medicinische Botanik. Ein Grundriss der systematischen Botanik zum Selbststudium fr Aerzte, Apotheker und Botaniker). This happens whatever I type in (rzte, Aerzte, Arzte).
Henning
Von: JiriFrank [mailto:reply@reply.github.com] Gesendet: Di 17.01.2012 13:29 An: Scholz, Henning Betreff: Re: [bhle] 1.1.4 - Search with words including diacritic characters (Simple Search) (#36)
1.1.4 - Search with words including diacritic characters (Simple Search)
COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple search
Description:
Search with terms based on Latin alphabet including diacritical mark, e.g. , , , , ., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.
Example: search for 'sterreich' gives results for 'sterreich' and 'Oesterreich' and 'Osterreich'.
@AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH
Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-3527123
It works more or less for the umlaute. But it does not find (the obviously Faeroese) Plantulæra - whether the exact spelling nor "Plantulaera". It also has problems in finding the place of publishing "Tórshavn" (same publication) which is a bit surprising because that works for other á etc. cases.
-----Ursprüngliche Nachricht----- Von: JiriFrank [mailto:reply@reply.github.com] Gesendet: Dienstag, 17. Januar 2012 13:30 An: Hand, Ralf Betreff: Re: [bhle] 1.1.4 - Search with words including diacritic characters (Simple Search) (#36)
1.1.4 - Search with words including diacritic characters (Simple Search)
COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple search
Description:
Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.
Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.
@AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH
Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-3527123
Works fine!
-----Ursprüngliche Nachricht----- Von: JiriFrank [mailto:reply@reply.github.com] Gesendet: Dienstag, 17. Januar 2012 13:30 An: Scholz, Annemarie Betreff: Re: [bhle] 1.1.4 - Search with words including diacritic characters (Simple Search) (#36)
1.1.4 - Search with words including diacritic characters (Simple Search)
COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple search
Description:
Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.
Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.
@AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH
Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-3527123
A slight change of my previous opinion: If I look for "Plantulaera", it is not found. The normal user cannot type the ligature "ae"...
-----Ursprüngliche Nachricht----- Von: JiriFrank [mailto:reply@reply.github.com] Gesendet: Dienstag, 17. Januar 2012 13:30 An: Scholz, Annemarie Betreff: Re: [bhle] 1.1.4 - Search with words including diacritic characters (Simple Search) (#36)
1.1.4 - Search with words including diacritic characters (Simple Search)
COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple search
Description:
Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.
Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.
@AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH
Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-3527123
1.1.4 - Search with words including diacritic characters (Simple Search)
COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple search
Description:
Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.
Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.
@AntonioGVH @fwelter @LarissaS
Compared the search for plantulaera and plantulæra in BHL(US). In BHLUS the system accepts both forms. [The ALT char-code for æ is 0230] Users should not need to have to remember these codes. However, the autocomplete function should compensate for this, as it provides suggestions with diacritics, as far as I have noticed.
Now switched to IE 8.0.6001.18702 I can't get the test system to do a simple search for either plantulaera or plantulæra.
In this feature are two obvious examples for bugs. Due to small amount of content without markable diacritic it is complicated to test it in more details.
Related bug Issue #288
Ready for testing.
1.1.4 - Search with words including diacritic characters (Simple Search)
COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple Search
Description:
Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.
Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.
Testers: @AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH
1.1.4 - Search with words including diacritic characters (Simple Search)
COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple Search
Description:
Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.
Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.
@GregKenicer @GeoffHarper @KasiaGoral @MartinaMetzger @DanielFisher @NeilWoodcock @SaraPerzley @SaraCarlton
1.1.4 - Search with words including diacritic characters (Simple Search)
COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple Search
Description:
Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.
Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.
@HannaKoivula @PaiviLipsanen @PaiviJaakkola @SiniKarki @TiinaOnttonen
Browser Mozilla 5.0 Firefox 2.0.0.16 from 2006, the same problem as before, nothing can be typed into the search field because moving images are located in front of the search box.
1.1.4 - Search with words including diacritic characters (Simple Search)
COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple Search
Description:
Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à ., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.
Example: search for 'Ãsterreich' gives results for 'Ãsterreich' and 'Oesterreich' and 'Osterreich'.
Testers: @AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH
Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-5093012
Francisco Welter-Schultes Zoologisches Institut, Berliner Str. 28, D-37073 Goettingen Phone +49 551 395536 http://www.animalbase.org
Hi Jiri Thanks for the instruction on obtaining the current test list of books; it was in one of your earlier messages, but I'd forgotten. I was working from an old list. There isn't much time left so I can't do much. But I've at least found that Simple Search works OK to locate Follmann Uber die unterdevonischen Schichten bei Coblenz. However several attempts to type 'U' with umlaut (using Alt+0252 and Alt+0220) in the search box resulted in no character being printed. Geoff
-----Original Message----- From: JiriFrank [mailto:reply@reply.github.com] Sent: 12 April 2012 16:09 To: Geoffrey Harper Subject: Re: [bhle] 1.1.4 - Search with words including diacritic characters (Simple Search) (#36)
1.1.4 - Search with words including diacritic characters (Simple Search)
COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple Search
Description:
Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.
Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.
@GregKenicer @GeoffHarper @KasiaGoral @MartinaMetzger @DanielFisher @NeilWoodcock @SaraPerzley @SaraCarlton
Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-5093051
The Royal Botanic Garden Edinburgh is a Charity registered in Scotland (No SC007983)
I use the most current Chrome on Win 7 for these tests.
Works fine.
Von: JiriFrank [mailto:reply@reply.github.com] Gesendet: Do 12.04.2012 17:06 An: Scholz, Henning Betreff: Re: [bhle] 1.1.4 - Search with words including diacritic characters (Simple Search) (#36)
1.1.4 - Search with words including diacritic characters (Simple Search)
COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple Search
Description:
Search with terms based on Latin alphabet including diacritical mark, e.g. , , , , ., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.
Example: search for 'sterreich' gives results for 'sterreich' and 'Oesterreich' and 'Osterreich'.
Testers: @AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH
Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-5093012
Switched to Chrome, as IE10 and Firefox 9 still have display issues, which makes searching impossible.
Simple search for words containing ö, é, ä, working fine.
I use Firefox 11 for Ubuntu.
Searched for München. No result.
Searched for Nürnberg. No result.
Searched for Düsseldorf. No result.
Sorry, misunderstood the task and searched for words not in the current book list.
This time searched for:
Studiengänge One result - 'Vier landw. Studiengänge durch die Gemarkung Niederlahnstein und ihre Flurdistrikte'. Did not get results with Unicode translation though.
Ready to close
Updated description: Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.
Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.
@akohlbecker
Previous: Search term with ä, ö, ü, ç, à, etc.: find also ae, oe, ue, etc.
Search term - Küsten, Löwe, etc. - Find Küsten, Kuesten and Löwe, Loewe
Display correctly in result list.
Related bug #176