gbhl / bhl-europe

Biodiversity Heritage Library Europe
http://www.bhl-europe.eu/
15 stars 2 forks source link

1.1.4 - Search with words including diacritic characters (Simple Search) #36

Closed janahoffmann closed 12 years ago

janahoffmann commented 13 years ago

Updated description: Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.

Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.

@akohlbecker

Previous: Search term with ä, ö, ü, ç, à, etc.: find also ae, oe, ue, etc.

Search term - Küsten, Löwe, etc. - Find Küsten, Kuesten and Löwe, Loewe

Display correctly in result list.

Related bug #176

akohlbecker commented 13 years ago

Even characters which look (nearly) the same may differ in the number by which they are encoded. Here is an example (copied from #1 to here) of a related bug in simple search:

searching for 'Küsten' ( http://test111.ait.co.at/?q=advanced_search_view/{!lucene}K%C3%BCsten ) should return an item with title "Über die Ökologie und Verbreitung der Arthropoden der Triebsandgebiete an den Küsten Finnlands". This does not work when the term 'Küsten' is typed in, however it works when copy/pasting the term from the portal page http://test111.ait.co.at/?q=node/21991

compare the following searches:

copy paste: http://test111.ait.co.at/?q=advanced_search_view/{!lucene}Ku%CC%88sten typed in: http://test111.ait.co.at/?q=advanced_search_view/{!lucene}K%C3%BCsten

janahoffmann commented 12 years ago

functional test in simple search possible, also include comment above

JiriFrank commented 12 years ago

Tested by terms: Schlüter and Schluter brings the same results. Schlueter has no results.

Bücher and Bucher have the same results including both terms. Buecher has no results.

A result for Künste includes mainly kunst.

Küsten: in this case is problem in metadata, because the term Küsten on the website is not written in metadata properly but it is probably problem of OCR. Same problems are on BHL portal, especially by old books. It looks similar, but when you export or edit the metadata, the term is different. Copy past tactic is just copy the wrong term from the metadata to the searching option. It is actually not a bug of the portal but of the metadata them self. This function mainly depends on the quality of metadata and OCR.

Löwe: The results are mainly for low and even many doesn’t correspond with searching term or his part. The same results are for term Loewe.

Conclusion: Search with words including diacritic characters is not working properly. Problem could be caused partly by quality of metadata but also by bug in search option.

Bug: searching by terms ae, oe, ue, etc. is not working. The results for example for Löwe includes many different records included just parts of this word or even many of them doesn’t correspond with searching term.

JiriFrank commented 12 years ago

Testing of feature 1.1.4 - Search with words including diacritic characters

Feature description from the Catalogue of user requirements:

Search term with ä, ö, ü, ç, à, etc.: find also ae, oe, ue, etc.

Search term - Küsten, Löwe, etc. - Find Küsten, Kuesten and Löwe, Loewe

Display correctly in result list.

Testers: @AntonioGVH @fwelter @heimor @janahoffmann @JFTester @JiriFrank @LarissaS

JiriFrank commented 12 years ago

Testing of feature 1.1.4 - Search with words including diacritic characters

Feature description from the Catalogue of user requirements:

Search term with ä, ö, ü, ç, à, etc.: find also ae, oe, ue, etc.

Search term - Küsten, Löwe, etc. - Find Küsten, Kuesten and Löwe, Loewe

Display correctly in result list.

Testers: @fwelter

LarissaS commented 12 years ago

Testing of feature 1.1.4 - Search with words including diacritic characters

Test 1: searching for “Löwe” – 186 records,

Very difficult to see in the detail description of record where matching is (because no highlight) and if it is for “Löwe” or also for “low”?

Searching for “Loewe” - 28 records

I don't think that searching with ö find also oe (as in the requirements). I checked it with Löwe AND Aartsen - 0 results, while Loewe AND Aartsen gives 4 results (records from Naturalis)

Searching for “Vertébrés” – 47 records, 2 French texts from RMCA (again doubles!) and 45 from Naturalis, English or Dutch texts, but “eng” or "dut" does not appear in the left column!

Searching for “vertebrae” – 14 records Searching for “Lofoï” and “Lofoi” give same results – 2 records, Searching for “Algérie” gives 5 records both for Algérie (fre) and Algerie (ger)

Searching for "tête" and "tete" give same result - 13 records, all French texts.

Yesterday this error appeared then I logged out and was continuing the simple search: Notice: Undefined variable: _SESSION in advanced_search_view() (line 485 of /var/www/drupal/sites/all/modules/ait/advanced_search/advanced_search.module). Warning: array_key_exists() expects parameter 2 to be array, null given in advanced_search_view() (line 485 of /var/www/drupal/sites/all/modules/ait/advanced_search/advanced_search.module).

-----Original Message----- From: JiriFrank [mailto:reply@reply.github.com] Sent: 1 августа 2011 г. 10:19 To: Smirnova Larissa Subject: Re: [bhle] 1.1.4 - Search with words including diacritic characters (#36)

Testing of feature 1.1.4 - Search with words including diacritic characters

Feature description from the Catalogue of user requirements:

Search term with ä, ö, ü, ç, à, etc.: find also ae, oe, ue, etc.

Search term - Küsten, Löwe, etc. - Find Küsten, Kuesten and Löwe, Loewe

Display correctly in result list.

Testers: @AntonioGVH @fwelter @heimor @janahoffmann @JFTester @JiriFrank @LarissaS

Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-1698708 ###########################################

This message has been scanned by ICT - Africa Museum


9/8/2011 - Filtered through antispam by ICT

AntonioGVH commented 12 years ago

Queriing by 'Kürten' gives 3 results Queriing by 'Kusten' gives the same 3 results. Queriing by 'Kuesten' gives NO result.

Queriing by 'Löwe' gives results with 'low' and 'lowest', but not 'Löwe' (as far as I have looked for) Queriing by 'Loewe' gives results with 'Loew'.

I am including several screen shots.

Antonio

Quoting JiriFrank:

Testing of feature 1.1.4 - Search with words including diacritic characters

Feature description from the Catalogue of user requirements:

Search term with ä, ö, ü, ç, à, etc.: find also ae, oe, ue, etc.

Search term - Küsten, Löwe, etc. - Find Küsten, Kuesten and Löwe, Loewe

Display correctly in result list.

Testers: @AntonioGVH @fwelter @heimor @janahoffmann @JFTester @JiriFrank @LarissaS

Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-1698708

fwelter commented 12 years ago

1.1.4 diacritic marks

voyages îles

mantides musée

"Musée"

"Physeter Macrocephalus Linnaeus"

"Physeter Macrocephalus Linnæus"

"ueber die placenta"

"über die placenta"

"uber die placenta""

"ber die placenta"

oesterreich

osterreich

österreich

diacritic marks are ignored in searching and finding. seems to work fine. oe is not recognised as ö. ß is recognised as ss.


1 - the result lists generally displayed titles where none of the words were contained in the title, author or year, but in the text of the abstract and notes of the item page.

2 - single journal volumes were displayed for serial runs, they blocked all intelligent search attempts and spammed the results list with over and over the same journal title.

Francisco


F. Welter-Schultes Zoologisches Institut, Berliner Str. 28, D-37073 Goettingen Phone +49 551 395536, Fax +49 551 395579 http://www.gwdg.de/~fwelter http://www.animalbase.org

janahoffmann commented 12 years ago

@akohlbecker Please check whether the new description is accurate for the indicated problem. Do we get all diacritics with a Unicode translation? What are the preconditions for the features? I need this for the CoRv1. Thanks

akohlbecker commented 12 years ago

@janahoffmann Yes the new description correctly describes the problem

akohlbecker commented 12 years ago

this issue should fixed see #176, however I still need to connect the portal with gsearch and we need additional test data to be ingested @hengdi could you please "ingest" the following book? http://bhl-portal-dev.nhm.ac.uk/?q=node/21991&language=es Über die Ökologie und Verbreitung der Arthropoden der Triebsandgebiete an den Küsten Finnlands Über die Jährlichen Zuwachszonen der Schuppen und Beziehungen zwischen Sommertemperatur und Zuwachs bei Abramis brama Zur Biologie von Regulus r. regulus (L.) und Parus atricapillus borealis selys

audreyhzhang commented 12 years ago

@akohlbecker yes, ok, could you please figure me out where are the original pages and files of this book? Thanks.

akohlbecker commented 12 years ago

@hengdi sorry I got no idea where to find this book. I wrote @melitabirthaelmer if she knows about it, but she is out of office today, maybe Lee or Chris know something?

melitabirthaelmer commented 12 years ago

Here are the original files for the book (http://bhl-portal-dev.nhm.ac.uk/?q=node/21991&language=es) :

http://bhl-celsus.nhm.ac.uk/uploads/UHViikki/FI-Hb/FI-Hb_299947_acta_zoologica/299947_12-14_1932/

audreyhzhang commented 12 years ago

OK, this book is in the Fedora Repository now.

akohlbecker commented 12 years ago

this is now fixed, closing issue

JiriFrank commented 12 years ago

1.1.4 - Search with words including diacritic characters (Simple Search)

COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple search

Description:

Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.

Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.

@AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH

HenningScholz commented 12 years ago

This is not working perfectly. In some examples it does, in others not. When I look for "Aerzte" for example, I always get one result (Medicinisch-pharmaceutische botanik zugleich als handbuch der systematischen botanik fur botaniker, arzte und apotheker) but not this one (Deutsche Flora. Pharmaceutisch-medicinische Botanik. Ein Grundriss der systematischen Botanik zum Selbststudium fr Aerzte, Apotheker und Botaniker). This happens whatever I type in (rzte, Aerzte, Arzte).

Henning


Von: JiriFrank [mailto:reply@reply.github.com] Gesendet: Di 17.01.2012 13:29 An: Scholz, Henning Betreff: Re: [bhle] 1.1.4 - Search with words including diacritic characters (Simple Search) (#36)

1.1.4 - Search with words including diacritic characters (Simple Search)

COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple search

Description:

Search with terms based on Latin alphabet including diacritical mark, e.g. , , , , ., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.

Example: search for 'sterreich' gives results for 'sterreich' and 'Oesterreich' and 'Osterreich'.

@AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH


Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-3527123

RalfH commented 12 years ago

It works more or less for the umlaute. But it does not find (the obviously Faeroese) Plantulæra - whether the exact spelling nor "Plantulaera". It also has problems in finding the place of publishing "Tórshavn" (same publication) which is a bit surprising because that works for other á etc. cases.

-----Ursprüngliche Nachricht----- Von: JiriFrank [mailto:reply@reply.github.com] Gesendet: Dienstag, 17. Januar 2012 13:30 An: Hand, Ralf Betreff: Re: [bhle] 1.1.4 - Search with words including diacritic characters (Simple Search) (#36)

1.1.4 - Search with words including diacritic characters (Simple Search)

COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple search

Description:

Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.

Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.

@AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH


Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-3527123

AnneSch commented 12 years ago

Works fine!

-----Ursprüngliche Nachricht----- Von: JiriFrank [mailto:reply@reply.github.com] Gesendet: Dienstag, 17. Januar 2012 13:30 An: Scholz, Annemarie Betreff: Re: [bhle] 1.1.4 - Search with words including diacritic characters (Simple Search) (#36)

1.1.4 - Search with words including diacritic characters (Simple Search)

COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple search

Description:

Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.

Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.

@AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH


Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-3527123

AnneSch commented 12 years ago

A slight change of my previous opinion: If I look for "Plantulaera", it is not found. The normal user cannot type the ligature "ae"...

-----Ursprüngliche Nachricht----- Von: JiriFrank [mailto:reply@reply.github.com] Gesendet: Dienstag, 17. Januar 2012 13:30 An: Scholz, Annemarie Betreff: Re: [bhle] 1.1.4 - Search with words including diacritic characters (Simple Search) (#36)

1.1.4 - Search with words including diacritic characters (Simple Search)

COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple search

Description:

Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.

Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.

@AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH


Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-3527123

JiriFrank commented 12 years ago

1.1.4 - Search with words including diacritic characters (Simple Search)

COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple search

Description:

Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.

Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.

@AntonioGVH @fwelter @LarissaS

grahamhrbge1670 commented 12 years ago

Compared the search for plantulaera and plantulæra in BHL(US). In BHLUS the system accepts both forms. [The ALT char-code for æ is 0230] Users should not need to have to remember these codes. However, the autocomplete function should compensate for this, as it provides suggestions with diacritics, as far as I have noticed.

Now switched to IE 8.0.6001.18702 I can't get the test system to do a simple search for either plantulaera or plantulæra.

JiriFrank commented 12 years ago

In this feature are two obvious examples for bugs. Due to small amount of content without markable diacritic it is complicated to test it in more details.

Related bug Issue #288

JiriFrank commented 12 years ago

Ready for testing.

JiriFrank commented 12 years ago

1.1.4 - Search with words including diacritic characters (Simple Search)

COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple Search

Description:

Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.

Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.


Testers: @AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH

JiriFrank commented 12 years ago

1.1.4 - Search with words including diacritic characters (Simple Search)

COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple Search

Description:

Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.

Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.


@GregKenicer @GeoffHarper @KasiaGoral @MartinaMetzger @DanielFisher @NeilWoodcock @SaraPerzley @SaraCarlton

JiriFrank commented 12 years ago

1.1.4 - Search with words including diacritic characters (Simple Search)

COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple Search

Description:

Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.

Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.


@HannaKoivula @PaiviLipsanen @PaiviJaakkola @SiniKarki @TiinaOnttonen

fwelter commented 12 years ago

Browser Mozilla 5.0 Firefox 2.0.0.16 from 2006, the same problem as before, nothing can be typed into the search field because moving images are located in front of the search box.

1.1.4 - Search with words including diacritic characters (Simple Search)

COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple Search

Description:

Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à ., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.

Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.


Testers: @AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH


Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-5093012

Francisco Welter-Schultes Zoologisches Institut, Berliner Str. 28, D-37073 Goettingen Phone +49 551 395536 http://www.animalbase.org

GeoffHarper commented 12 years ago

Hi Jiri Thanks for the instruction on obtaining the current test list of books; it was in one of your earlier messages, but I'd forgotten. I was working from an old list. There isn't much time left so I can't do much. But I've at least found that Simple Search works OK to locate Follmann Uber die unterdevonischen Schichten bei Coblenz. However several attempts to type 'U' with umlaut (using Alt+0252 and Alt+0220) in the search box resulted in no character being printed. Geoff

-----Original Message----- From: JiriFrank [mailto:reply@reply.github.com] Sent: 12 April 2012 16:09 To: Geoffrey Harper Subject: Re: [bhle] 1.1.4 - Search with words including diacritic characters (Simple Search) (#36)

1.1.4 - Search with words including diacritic characters (Simple Search)

COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple Search

Description:

Search with terms based on Latin alphabet including diacritical mark, e.g. ä, ö, ü, ç, à., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.

Example: search for 'Österreich' gives results for 'Österreich' and 'Oesterreich' and 'Osterreich'.


@GregKenicer @GeoffHarper @KasiaGoral @MartinaMetzger @DanielFisher @NeilWoodcock @SaraPerzley @SaraCarlton


Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-5093051

The Royal Botanic Garden Edinburgh is a Charity registered in Scotland (No SC007983)

HenningScholz commented 12 years ago

I use the most current Chrome on Win 7 for these tests.

Works fine.


Von: JiriFrank [mailto:reply@reply.github.com] Gesendet: Do 12.04.2012 17:06 An: Scholz, Henning Betreff: Re: [bhle] 1.1.4 - Search with words including diacritic characters (Simple Search) (#36)

1.1.4 - Search with words including diacritic characters (Simple Search)

COR number: 1.1.4 Testing platform: http://bhl-test.nhm.ac.uk/portal/ Function: Simple Search

Description:

Search with terms based on Latin alphabet including diacritical mark, e.g. , , , , ., shall also recall results with Unicode translation ae, oe, ue, etc. and the original letter. Ranking should be done by best match.

Example: search for 'sterreich' gives results for 'sterreich' and 'Oesterreich' and 'Osterreich'.


Testers: @AnneSch @AntonioGVH @fwelter @grahamhrbge1670 @heimor @HenningScholz @JFTester @JiriFrank @LarissaS @RalfH


Reply to this email directly or view it on GitHub: https://github.com/bhle/bhle/issues/36#issuecomment-5093012

grahamtestrbge commented 12 years ago

Switched to Chrome, as IE10 and Firefox 9 still have display issues, which makes searching impossible.

Simple search for words containing ö, é, ä, working fine.

KasiaGoral commented 12 years ago

I use Firefox 11 for Ubuntu.

Searched for München. No result.

Searched for Nürnberg. No result.

Searched for Düsseldorf. No result.

KasiaGoral commented 12 years ago

Sorry, misunderstood the task and searched for words not in the current book list.

This time searched for:

Studiengänge One result - 'Vier landw. Studiengänge durch die Gemarkung Niederlahnstein und ihre Flurdistrikte'. Did not get results with Unicode translation though.

JiriFrank commented 12 years ago

Ready to close