BetaMasaheft / Documentation

Die Schriftkultur des christlichen Äthiopiens: Eine multimediale Forschungsumgebung
3 stars 3 forks source link

search Fidal if user entered transliteration or transliteration if you search Fidal #1631

Closed PietroLiuzzo closed 3 years ago

PietroLiuzzo commented 3 years ago

he searches in the website use several techniques to allow for results in any of the transcription present to be returned. It will not however search Fidal if you entered transliteration or transliteration if you search Fidal. This feature is planned but not yet available.

Originally posted by @PietroLiuzzo in https://github.com/BetaMasaheft/Documentation/discussions/1630#discussioncomment-218475

PietroLiuzzo commented 3 years ago

can be implemented with a query to the morpho parser and a check in the fuseki traces dataset.

PietroLiuzzo commented 3 years ago

search for 'አሮን' will send to the query builder ((ኣሮን) OR (ዓሮን) OR (አሮን) OR (ዐሮን)) OR ((ʿäron) OR (aron) OR (äron) OR (ʾäron) OR (aron) OR (äron) OR (ʿaron) OR (ʾaron)) Here OR is not the optional operator chosen, it is indeed required because the translit is an alternative to the entered query and the homophones are alternatives to each of this. the translit is fetched from the traces corpus with a sparql query.
example dataset test walda, no options 'walda' = 163 in 3s walda, homophones yes, translit no '(ʷälda) OR (ʷalda) OR (walda) OR (ʷäldä) OR (wäldä) OR (waldä) OR (wälda) OR (ʷaldä)' = 163 in 4s walda, homophones no, translit yes 'walda OR ወልደ' = 283 in 8s walda, homophones yes, translit yes '((ʷälda) OR (ʷalda) OR (walda) OR (ʷäldä) OR (wäldä) OR (waldä) OR (wälda) OR (ʷaldä)) OR ((ወልደ))' = 283 in 9s

PietroLiuzzo commented 3 years ago

this feature is now available (i.e. will be with the release). An optional checkbox, with explanation, can be selected to fetch possible transliterations. first attempt is made from traces annotations as stored in fuseki. if nothing is found there then the morphoparser is asked for its opinion. error is possible in any case. Some combinations of this opt out some others, or make them too complex, so the xquery tries to avoid them actively and for example will not do a replacement of homophones if a phrase mode is requested. combinations are however many, and I am sure I have missed some. practice will tell what is used and what not. below some records, based on my test data for the time being, which boil down, for me, to the following observation: options do not help much, the index already does a lot and filtering after search is faster in any case. but it is true that some hits can be obtained which would not otherwise. we will see.

Search string options lucene query string hits ms  
amda seyon no options amda seyon <query><bool><term occur="should">amda</term><term occur="should">seyon</term></bool></query> 52 1563  
amda seyon translit (ምደ OR amda) OR (ሳዩን OR seyon) <query><bool><bool occur="should"><bool><term occur="should">ምደ</term></bool><bool><term occur="should">amda</term></bool></bool><bool occur="should"><bool><term occur="should">ሳዩን</term></bool><bool><term occur="should">seyon</term></bool></bool></bool></query> 55 3363  
amda seyon homophones (ämdä ḍəyon) OR (amdä səyon) OR (ämda ḍǝyon) OR (amda sǝyon) OR (amda ḍǝyon) OR (amdä ḍəyon) OR (ämdä səyon) OR (ämdä ḍǝyon) OR (ämdä sǝyon) OR (amda səyon) OR (ämda ḍəyon) OR (amdä ḍēyon) OR (ämda seyon) OR (ämda səyon) OR (amda sēyon) OR (ämdä ḍēyon) OR (ämda sǝyon) OR (amda ḍəyon) OR (ämdä sēyon) OR (amda ḍēyon) OR (ämdä seyon) OR (amdä sēyon) OR (ämda ḍeyon) OR (amda ḍeyon) OR (amda seyon) OR (ämda ḍēyon) OR (amdä seyon) OR (ämda sēyon) OR (amdä ḍǝyon) OR (amdä ḍeyon) OR (ämdä ḍeyon) OR (amdä sǝyon) <query><bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">sǝyon</term></bool></bool></query> 54 1711  
amda seyon homophones + translit (ምደ OR amda OR ämdä OR amdä OR ämda) OR (ሣዩን OR ሳዩን OR seyon OR ḍəyon OR sǝyon) etc. <query><bool><bool><bool><bool><term occur="should">ምደ</term></bool></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">amda</term><term occur="should">amdä</term><term occur="should">ämda</term></bool></bool><bool><bool occur="should"><term occur="should">ሣዩን</term><term occur="should">ሳዩን</term></bool><bool occur="should"><term occur="should">ḍəyon</term><term occur="should">seyon</term><term occur="should">sǝyon</term><term occur="should">sēyon</term><term occur="should">ḍǝyon</term><term occur="should">ḍēyon</term><term occur="should">səyon</term><term occur="should">ḍeyon</term></bool></bool></bool></query> 56 3596  
amda seyon homophones + translit + ranking Same as above, the ranking affects only the ordering of the results <query><bool><bool occur="should"><term occur="should">ምደ</term></bool><bool occur="should"><term occur="should">ämdä</term></bool><bool occur="should"><term occur="should">amda</term></bool><bool occur="should"><term occur="should">amdä</term></bool><bool occur="should"><term occur="should">ämda</term></bool><bool occur="should"><term occur="should">ሣዩን</term></bool><bool occur="should"><term occur="should">ሳዩን</term></bool><bool occur="should"><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ḍeyon</term></bool></bool></query> 56 7466 ranking uneffective
amda seyon ranking amda seyon <query><bool><term occur="should">amda</term><term occur="should">seyon</term></bool></query> 52 3331 ranking uneffective
amda seyon phrase mode 'amda seyon’ <query><phrase>amda seyon</phrase></query> 8 875  
amda seyon phrase mode + homophones 'amda seyon’ <query><phrase>amda seyon</phrase></query> 8 718 Homophones are actively ignored
amda seyon phrase mode + homophones + translit 'ምደ ሳዩን’ or 'amda seyon’ <query><phrase>ምደ ሳዩን</phrase><phrase>amda seyon</phrase></query> 8 1156 Homophones are actively ignored
amda seyon phrase mode + homophones + translit + ranking 'ምደ ሳዩን’ or 'amda seyon’ <query><phrase>ምደ ሳዩን</phrase><phrase>amda seyon</phrase></query> 8 1605 ranking effective
amda seyon AND operator amda AND seyon <query><bool><term occur="must">amda</term><term occur="must">seyon</term></bool></query> 8 813  
amda seyon AND operator + homophones (amda AND seyon) OR (ämdä AND ḍəyon) OR etc. <query><bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">sǝyon</term></bool></bool></query> 54 1522  
amda seyon AND operator + homophones + translit (amda OR ämdä OR ämda) AND (seyon  OR ḍəyon OR sǝyon) <query>     <bool>         <bool occur="must">             <term occur="should">ämdä</term>             <term occur="should">amda</term>             <term occur="should">amdä</term>             <term occur="should">ämda</term>         </bool>         <bool occur="must">             <term occur="should">ḍəyon</term>             <term occur="should">seyon</term>             <term occur="should">sǝyon</term>             <term occur="should">sēyon</term>             <term occur="should">ḍǝyon</term>             <term occur="should">ḍēyon</term>             <term occur="should">səyon</term>             <term occur="should">ḍeyon</term>         </bool>     </bool> </query> 57 4306 Because nesting lucent booleans in one query Is not allowed the homophones will be rated as should]
amda seyon AND operator + homophones + translit + ranking   <query><bool><bool occur="should"><bool><bool><term occur="should">ምደ</term></bool></bool><bool occur="should"><term occur="should">ሣዩን</term><term occur="should">ሳዩን</term></bool></bool><bool occur="should"><bool occur="should"><term occur="should">ämdä</term><term occur="should">amda</term><term occur="should">amdä</term><term occur="should">ämda</term></bool><bool occur="should"><term occur="should">ḍəyon</term><term occur="should">seyon</term><term occur="should">sǝyon</term><term occur="should">sēyon</term><term occur="should">ḍǝyon</term><term occur="should">ḍēyon</term><term occur="should">səyon</term><term occur="should">ḍeyon</term></bool></bool></bool></query> 57 10420 This is where we would have nested AND and OR, which do not work. So all becomes OR
amda seyon fuzzy   <query><fuzzy max-edits="2"><bool><term occur="should">amda</term><term occur="should">seyon</term></bool></fuzzy></query> 6 256
amda seyon fuzzy + AND operator   <query><fuzzy max-edits="2"><bool><term occur="must">amda</term><term occur="must">seyon</term></bool></fuzzy></query> 6 78
amda seyon Near unordered   <query><near slop="5" ordered="no"><term occur="should">amda</term><term occur="should">seyon</term></near></query> 9 960  
amda seyon Near ordered + ranking   <query><near slop="5" ordered="yes"><term occur="should">amda</term><term occur="should">seyon</term></near></query> 9 2745  
amda seyon fuzzy   <query><fuzzy max-edits=„2">amda seyon</fuzzy></query>