Closed PietroLiuzzo closed 3 years ago
can be implemented with a query to the morpho parser and a check in the fuseki traces dataset.
search for 'አሮን' will send to the query builder ((ኣሮን) OR (ዓሮን) OR (አሮን) OR (ዐሮን)) OR ((ʿäron) OR (aron) OR (äron) OR (ʾäron) OR (
aron) OR (äron) OR (ʿaron) OR (ʾaron))
Here OR is not the optional operator chosen, it is indeed required because the translit is an alternative to the entered query and the homophones are alternatives to each of this.
the translit is fetched from the traces corpus with a sparql query.
example dataset test
walda, no options 'walda' = 163 in 3s
walda, homophones yes, translit no '(ʷälda) OR (ʷalda) OR (walda) OR (ʷäldä) OR (wäldä) OR (waldä) OR (wälda) OR (ʷaldä)' = 163 in 4s
walda, homophones no, translit yes 'walda OR ወልደ' = 283 in 8s
walda, homophones yes, translit yes '((ʷälda) OR (ʷalda) OR (walda) OR (ʷäldä) OR (wäldä) OR (waldä) OR (wälda) OR (ʷaldä)) OR ((ወልደ))' = 283 in 9s
this feature is now available (i.e. will be with the release). An optional checkbox, with explanation, can be selected to fetch possible transliterations. first attempt is made from traces annotations as stored in fuseki. if nothing is found there then the morphoparser is asked for its opinion. error is possible in any case. Some combinations of this opt out some others, or make them too complex, so the xquery tries to avoid them actively and for example will not do a replacement of homophones if a phrase mode is requested. combinations are however many, and I am sure I have missed some. practice will tell what is used and what not. below some records, based on my test data for the time being, which boil down, for me, to the following observation: options do not help much, the index already does a lot and filtering after search is faster in any case. but it is true that some hits can be obtained which would not otherwise. we will see.
Search string | options | lucene | query string | hits | ms | |
---|---|---|---|---|---|---|
amda seyon | no options | amda seyon | <query><bool><term occur="should">amda</term><term occur="should">seyon</term></bool></query> |
52 | 1563 | |
amda seyon | translit | (ምደ OR amda) OR (ሳዩን OR seyon) | <query><bool><bool occur="should"><bool><term occur="should">ምደ</term></bool><bool><term occur="should">amda</term></bool></bool><bool occur="should"><bool><term occur="should">ሳዩን</term></bool><bool><term occur="should">seyon</term></bool></bool></bool></query> |
55 | 3363 | |
amda seyon | homophones | (ämdä ḍəyon) OR (amdä səyon) OR (ämda ḍǝyon) OR (amda sǝyon) OR (amda ḍǝyon) OR (amdä ḍəyon) OR (ämdä səyon) OR (ämdä ḍǝyon) OR (ämdä sǝyon) OR (amda səyon) OR (ämda ḍəyon) OR (amdä ḍēyon) OR (ämda seyon) OR (ämda səyon) OR (amda sēyon) OR (ämdä ḍēyon) OR (ämda sǝyon) OR (amda ḍəyon) OR (ämdä sēyon) OR (amda ḍēyon) OR (ämdä seyon) OR (amdä sēyon) OR (ämda ḍeyon) OR (amda ḍeyon) OR (amda seyon) OR (ämda ḍēyon) OR (amdä seyon) OR (ämda sēyon) OR (amdä ḍǝyon) OR (amdä ḍeyon) OR (ämdä ḍeyon) OR (amdä sǝyon) | <query><bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">sǝyon</term></bool></bool></query> |
54 | 1711 | |
amda seyon | homophones + translit | (ምደ OR amda OR ämdä OR amdä OR ämda) OR (ሣዩን OR ሳዩን OR seyon OR ḍəyon OR sǝyon) etc. | <query><bool><bool><bool><bool><term occur="should">ምደ</term></bool></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">amda</term><term occur="should">amdä</term><term occur="should">ämda</term></bool></bool><bool><bool occur="should"><term occur="should">ሣዩን</term><term occur="should">ሳዩን</term></bool><bool occur="should"><term occur="should">ḍəyon</term><term occur="should">seyon</term><term occur="should">sǝyon</term><term occur="should">sēyon</term><term occur="should">ḍǝyon</term><term occur="should">ḍēyon</term><term occur="should">səyon</term><term occur="should">ḍeyon</term></bool></bool></bool></query> |
56 | 3596 | |
amda seyon | homophones + translit + ranking | Same as above, the ranking affects only the ordering of the results | <query><bool><bool occur="should"><term occur="should">ምደ</term></bool><bool occur="should"><term occur="should">ämdä</term></bool><bool occur="should"><term occur="should">amda</term></bool><bool occur="should"><term occur="should">amdä</term></bool><bool occur="should"><term occur="should">ämda</term></bool><bool occur="should"><term occur="should">ሣዩን</term></bool><bool occur="should"><term occur="should">ሳዩን</term></bool><bool occur="should"><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ḍeyon</term></bool></bool></query> |
56 | 7466 | ranking uneffective |
amda seyon | ranking | amda seyon | <query><bool><term occur="should">amda</term><term occur="should">seyon</term></bool></query> |
52 | 3331 | ranking uneffective |
amda seyon | phrase mode | 'amda seyon’ | <query><phrase>amda seyon</phrase></query> |
8 | 875 | |
amda seyon | phrase mode + homophones | 'amda seyon’ | <query><phrase>amda seyon</phrase></query> |
8 | 718 | Homophones are actively ignored |
amda seyon | phrase mode + homophones + translit | 'ምደ ሳዩን’ or 'amda seyon’ | <query><phrase>ምደ ሳዩን</phrase><phrase>amda seyon</phrase></query> |
8 | 1156 | Homophones are actively ignored |
amda seyon | phrase mode + homophones + translit + ranking | 'ምደ ሳዩን’ or 'amda seyon’ | <query><phrase>ምደ ሳዩን</phrase><phrase>amda seyon</phrase></query> |
8 | 1605 | ranking effective |
amda seyon | AND operator | amda AND seyon | <query><bool><term occur="must">amda</term><term occur="must">seyon</term></bool></query> |
8 | 813 | |
amda seyon | AND operator + homophones | (amda AND seyon) OR (ämdä AND ḍəyon) OR etc. | <query><bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">səyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">sǝyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍəyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">amda</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">ḍēyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">seyon</term></bool><bool occur="should"><term occur="should">ämda</term><term occur="should">sēyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍǝyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">ämdä</term><term occur="should">ḍeyon</term></bool><bool occur="should"><term occur="should">amdä</term><term occur="should">sǝyon</term></bool></bool></query> |
54 | 1522 | |
amda seyon | AND operator + homophones + translit | (amda OR ämdä OR ämda) AND (seyon OR ḍəyon OR sǝyon) | <query> <bool> <bool occur="must"> <term occur="should">ämdä</term> <term occur="should">amda</term> <term occur="should">amdä</term> <term occur="should">ämda</term> </bool> <bool occur="must"> <term occur="should">ḍəyon</term> <term occur="should">seyon</term> <term occur="should">sǝyon</term> <term occur="should">sēyon</term> <term occur="should">ḍǝyon</term> <term occur="should">ḍēyon</term> <term occur="should">səyon</term> <term occur="should">ḍeyon</term> </bool> </bool> </query> |
57 | 4306 | Because nesting lucent booleans in one query Is not allowed the homophones will be rated as should] |
amda seyon | AND operator + homophones + translit + ranking | <query><bool><bool occur="should"><bool><bool><term occur="should">ምደ</term></bool></bool><bool occur="should"><term occur="should">ሣዩን</term><term occur="should">ሳዩን</term></bool></bool><bool occur="should"><bool occur="should"><term occur="should">ämdä</term><term occur="should">amda</term><term occur="should">amdä</term><term occur="should">ämda</term></bool><bool occur="should"><term occur="should">ḍəyon</term><term occur="should">seyon</term><term occur="should">sǝyon</term><term occur="should">sēyon</term><term occur="should">ḍǝyon</term><term occur="should">ḍēyon</term><term occur="should">səyon</term><term occur="should">ḍeyon</term></bool></bool></bool></query> |
57 | 10420 | This is where we would have nested AND and OR, which do not work. So all becomes OR | |
amda seyon | fuzzy | <query><fuzzy max-edits="2"><bool><term occur="should">amda</term><term occur="should">seyon</term></bool></fuzzy></query> |
6 | 256 | ||
amda seyon | fuzzy + AND operator | <query><fuzzy max-edits="2"><bool><term occur="must">amda</term><term occur="must">seyon</term></bool></fuzzy></query> |
6 | 78 | ||
amda seyon | Near unordered | <query><near slop="5" ordered="no"><term occur="should">amda</term><term occur="should">seyon</term></near></query> |
9 | 960 | ||
amda seyon | Near ordered + ranking | <query><near slop="5" ordered="yes"><term occur="should">amda</term><term occur="should">seyon</term></near></query> |
9 | 2745 | ||
amda seyon | fuzzy | <query><fuzzy max-edits=„2">amda seyon</fuzzy></query> |
he searches in the website use several techniques to allow for results in any of the transcription present to be returned. It will not however search Fidal if you entered transliteration or transliteration if you search Fidal. This feature is planned but not yet available.
Originally posted by @PietroLiuzzo in https://github.com/BetaMasaheft/Documentation/discussions/1630#discussioncomment-218475