Some kannada words in the results definition will have a period sign or number at the end. Currently these become part of the hyperlink as well as the search query, and the results fetched don't match the original word due to these numbers/period sign at the end.
Example:
In the results for ಲೊಳ್, you'll notice some kannada words in results definition have a period or a number at the end. In this case it is "ಲಾಳ1" and "ಲೋಳಿಸರ."
These numbers and period signs creep into the resultant query (as the link contains the same text). The result for "ಲೋಳಿಸರ." are quite different from "ಲೋಳಿಸರ".
The current isKannada function is actually checking if the input word has a kannada character, it will return true when any one character in input word is a kannada character - hence renamed it to hasKannadaChar.
After splitting a definition into individual words, when we have a word which has a kannadaChar, we remove all non-kannada characters from the word and use the cleaned word in both the href and the text within the a tag. The trailing non-kannada characters are simply added as a text node at the end of the parent span.
For now this should fix the issue, and separate the trailing number/period from the actual kannada word. But this is probably an issue in the dataset itself where some spaces might be missing, and that probably needs to be cleaned.
Issue
Some kannada words in the results definition will have a period sign or number at the end. Currently these become part of the hyperlink as well as the search query, and the results fetched don't match the original word due to these numbers/period sign at the end.
Example:
In the results for ಲೊಳ್, you'll notice some kannada words in results definition have a period or a number at the end. In this case it is "ಲಾಳ1" and "ಲೋಳಿಸರ."
These numbers and period signs creep into the resultant query (as the link contains the same text). The result for "ಲೋಳಿಸರ." are quite different from "ಲೋಳಿಸರ".
Similarly for "ಲಾಳ1" and "ಲಾಳ"
Fix
The current
isKannada
function is actually checking if the input word has a kannada character, it will return true when any one character in input word is a kannada character - hence renamed it tohasKannadaChar
.After splitting a definition into individual words, when we have a word which has a kannadaChar, we remove all non-kannada characters from the word and use the cleaned word in both the
href
and the text within thea
tag. The trailing non-kannada characters are simply added as a text node at the end of the parent span.For now this should fix the issue, and separate the trailing number/period from the actual kannada word. But this is probably an issue in the dataset itself where some spaces might be missing, and that probably needs to be cleaned.
I've tested this within the browser in Edge Dev.