alpheios-project / webextension

Alpheios Browser Extensions
ISC License
6 stars 2 forks source link

En language in wordlist #331

Closed monzug closed 2 years ago

monzug commented 2 years ago

I have never seen the en language id in wordlist before yesterday and latest FF builds. Note that this is only for Scaife site. As soon as I install the new FF build and I do some lookup/double click, the words are saved in wordlist with en language id. Irina, do you have any idea where this en language id comes from, only in Scaife? is it the page language defined on the site? see attachment below. as soon as I reset page language to greek or latin in Options, I do not get these errors any longer.

english

by clicking on any word, saved under en language id, I get a bunch of errors. See below

greek-data

irina060981 commented 2 years ago

Yes, it has en as a page language. It is easy to check - open context menu and select View Page source

image

You would see a page in non-rendered mode image

You could see that it is defined as en => <html lang="en">

So it is not a bug, as a page language has priority

monzug commented 2 years ago

@Irina, do you remember the issue # about priorities on page language? we made that change after the release and I have never tested in web extension.

On Thu, Nov 18, 2021 at 11:40 PM Sklyarova Irina @.***> wrote:

Assigned #331 https://github.com/alpheios-project/webextension/issues/331 to @monzug https://github.com/monzug.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/webextension/issues/331#event-5641882383, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ32UONX7C3Y3MQ5PCAS5ITUMV6E5ANCNFSM5IJ4EDOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

irina060981 commented 2 years ago

No, I don't remember a number - may be you could search for it in issues list?

monzug commented 2 years ago

It might be this one: https://github.com/alpheios-project/alignment-editor-new/issues/302

On Fri, Nov 19, 2021 at 7:56 AM Sklyarova Irina @.***> wrote:

No, I don't remember a number - may be you could search for it in issues list?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/webextension/issues/331#issuecomment-973801608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ32UONU2XHJUEVHUMJ3OXDUMXYIDANCNFSM5IJ4EDOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

irina060981 commented 2 years ago

Yes, I think you are quite right

monzug commented 2 years ago

an other one: https://github.com/alpheios-project/alignment-editor-new/pull/328 but I think this is the one I was looking for: https://github.com/alpheios-project/alpheios-core/issues/639

monzug commented 2 years ago

I do not understand why in wordlist words are saved in both en language list and latin language list. there is something not quite right here.

Screen Shot 2021-11-19 at 12 55 10 PM

Screen Shot 2021-11-19 at 2 20 24 PM

monzug commented 2 years ago

also the show contexts icon is available for words in the en language list but not for same words in the lat language list. see below

show-context

irina060981 commented 2 years ago

I described the same problem here https://github.com/alpheios-project/webextension/issues/330

irina060981 commented 2 years ago

Bug is fixed. New Release is published.

monzug commented 2 years ago

I still see the same issue: words been saved in en and also in the Alpheios page language in wordlist and that the show contexts icon is available for words in the en language list only. see example of same word repeated 3 times in wordlist: en as page language in scaife, greek as page language in alpheios, latin when I changed the page language to latin. @irina060981, is there anything here that can be done to improve this? if English is the page language of the site but it's a language that we do not support (it's not latin or greek or persian or arabic or chinese or syriac or ge'ez), would it be possible to not show it in wordlist?

English page language Screen Shot 2021-11-22 at 2 49 16 PM

Italian page language Screen Shot 2021-11-22 at 3 07 35 PM

what I do not like: 1) having both the page language list of words + the supported language list of words can cause loooong wordlists. 2) the eng list of words had the show contexts icon but it's missing the lemma. and latin list of words has the lemma, the # of occurancies, but not show contexts.

irina060981 commented 2 years ago

yes, the same bug is repeated on https://scaife.perseus.org/reader/ It is strange - I will check

irina060981 commented 2 years ago

I find a small bug in language definition (hope it is last) - now it is fixed and I could not reproduced it any more.

Anyway I would describe what was the source of the problem, may be you could face with some other edge case.

There are two aspects:

  1. When an application prepare a word to get morphology dater it creates two objects: one for word information (TextSelector) and one for context information (TextQuoteSelector). Previous bug was in the case when application decides to change word language (according to https://github.com/alpheios-project/alpheios-core/issues/639) , but in fact it changes it only for one object (TextSelctor) but not for the second (TextQuoteSelector) - so application first creates a worditem for one object (TextQuoteSelector) and then for the other (TextSelector). In normal case they would be merged automatically and finally we have one worditem. So my fix is - now language is updated in both objects.

  2. Each object (TextSelector) has two properties for storing language data - one is for plain language name (as is), one for formatted from supported list. (It happened historically, and out team has no time for removing such duplication, but it was in plans - https://github.com/alpheios-project/alpheios-core/issues/6) . And there was a case when a site has defined language that is not from supported list (like en), and that's why application decides to use Page language and updated only one language value, but the second was still en. And wordlist again registered two different words - on for homonym and one for context - because they use different language properties. - So again my fix - is to update both language objects.

Now I hope I found all bugs of this issue https://github.com/alpheios-project/alpheios-core/issues/639

monzug commented 2 years ago

will dig in soon. Hopefully it will be the last one as u said.

On Tue, Nov 23, 2021 at 4:55 AM Sklyarova Irina @.***> wrote:

Assigned #331 https://github.com/alpheios-project/webextension/issues/331 to @monzug https://github.com/monzug.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/webextension/issues/331#event-5657873987, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ32UOMGNXFY4MES2LA7FVLUNMGEVANCNFSM5IJ4EDOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

monzug commented 2 years ago

No more en language. Great. Thanks. to retest in Chrome and Safari

monzug commented 2 years ago

Actually, I just experienced this problem in Loeb Classics. in FF/PC I used loebclassic for first time in a really long time. select a latin text, click on few words, check wordlist and I have the words saved in two different language list: la and lat. The la language list has show context link, while the lat language list has the number of occurancies and the lemmas. see attachment duplicate language

and the link from wordlist of any words saved in La, does not work: lexical data is loading pop-up is generated, need to kill the pop-up. @irina060981 , any idea? mettalique

monzug commented 2 years ago

an other example of la and lat language list

due-la-lat-languages

irina060981 commented 2 years ago

It is because the block language is defined as "la" - shorter variant of "lat", that we used. Will check why it is not handeled correctly

monzug commented 2 years ago

Thanks. I figured that la is short def of lat. is there a more general way to prevent similar scenarios from happening?

irina060981 commented 2 years ago

Such problems are from our using two different forms of language inside (I described it in prebious comments here) I added normalization for language code - it would work for all similiar cases.

Will upload to releases later

irina060981 commented 2 years ago

Fixed

monzug commented 2 years ago

tested in FF/PC and Mac and Chrome/Mac. fixed.