alpheios-project / components

Repository has been moved to https://github.com/alpheios-project/alpheios-core/tree/master/packages/res-client
ISC License
1 stars 1 forks source link

TextSelector: if source language code is unsupported, use default LanguageCode #33

Closed balmas closed 6 years ago

balmas commented 6 years ago

If a site uses a language code which is unsupported, we should warn the user and use the default language code.

balmas commented 6 years ago

E.g. test on https://scaife.perseus.org/reader/urn:cts:latinLit:phi1002.phi001.perseus-lat2:1.pr.1-1.pr.5/ which has language code (incorrectly) set to 'en'

irina060981 commented 6 years ago

For now decided to create only a error message in the popup that tells, that current language is unsupported.

irina060981 commented 6 years ago

Added message based on not defined currentLanguage of TextSelection. Changed it in branches (lookup-component) https://github.com/alpheios-project/components/pull/56 (new-popup-layout) (reskin-color-font) https://github.com/alpheios-project/components/pull/42

balmas commented 6 years ago

There are a few problems here still:

  1. this line in HTMLSelector.createTextSelector ` let textSelector = new TextSelector(LanguageModelFactory.getLanguageIdFromCode(this.getLanguageCode(this.defaultLanguageCode)))

When the value returned by HTMLSelector.getLanguage is a language code we don't support, LanguageModelFactory.getLanguageIdFromCode returns the Constants.LANG_UNDEFINED which ends up eventually being converted to Constants.STR_LANG_CODE_UNDEFINED by LanguageModelFactor.getLanguageCodeFromId so that in UIController.updateLanguage the currentLanguage variable is indeed set to the string "undefined". This will make the test in Popup.noLanguage fail to behave properly.

  1. It is currently impossible to override an invalid language as calculated in HtmlSelector.getLanguageCodeFromSource because if a language is detected it always overrides the defaultLanguage setting. So there is no way for the user to get out of this loop.

For right now I think what I want to request is to change MediaSelector.getLanguageCode to test to see if what is returned by LanguageModelFactory.getLanguageIdFromCode is equal to Constants.LANG_UNDEFINED and if so, just return the defaultLanguageCode.

This will at least allow the user to change to one of the other supported languages. Ideally we would also note that we did this for the user but we need some structural changes to be able to pass messages up from the selector to the uiController.

irina060981 commented 6 years ago

Ok, I have realized this variant ( made changes to mediaSelector.js to getLanguageCode) and now if the langID === Constants.LANG_UNDEFINED then it returns default lang.

In branch - lookup-reskin https://github.com/alpheios-project/components/pull/68

balmas commented 6 years ago

This appears either not to have been fully fixed, or is broken again in the master branch.

The fix was in MediaSelector.getLanguageCode but it appears that this method is no longer called. The logic needs to move to MediaSelector.getLanguageID

balmas commented 6 years ago

@monzug can you do some testing on how we handle the setting of the initial language on pages in which it isn't explicitly defined or which it is defined incorrectly? The way the code should work is that it looks for the lang or xml:lang attribute on the closest parent element of the selected word. If it finds one, it tries to use that, and if it isn't a language we recognized, then it defaults to the preferred language setting. This is also what it should do if no lang/xml:lang attribute is found.

I think there are probably some improvements we might make to the UI in this area. As you have noted already it's a little confusing for the lookup component sometimes.

monzug commented 6 years ago

while testing this bug, I found this minor issue: enter some latin words and click on Search (mymemory.translated.net). page reloads and the alpheios panel shows up with no information content. see screenshot screenshot_2018-08-14 translate ad lucem invia in tagalog with examples

balmas commented 6 years ago

@irina060981 can you investigate?

monzug commented 6 years ago

basically, on reload, it does not default to the help page but to the last viewed panel option

irina060981 commented 6 years ago

@balmas and @monzug, I think Monica is right. If you look at the screenshot - it shows Grammar tab - that is empty if no word is selected. I have reproduced that and see the same workflow:

1) Panel is opened on load of the page 2) When Search button is clicked - page is fully reloaded, Panel is opened again (as on any page reload) and it shows the last opened tab (There is no init tab action in panel on page reload). I think Vue does this way (not fully initialized on page reload)

I could add some init tab action on panel mounted event if you think it is needed to be done.

balmas commented 6 years ago

Hmm. I'm not positive what the desired behavior is here. I think this may be driven by the webextension which is managing the state of the panel in between page reloads. So, we intentionally return to the last opened panel. Maybe this is made more confusing by the fact that the definitions pane is completely empty if you haven't looked up a word since the page was loaded/reloaded. Maybe we should have some hint text on this page in that case? @monzug what do you think?

monzug commented 6 years ago

if panel is on activate, I would default to the info page on reload. Need more time to think about this.

monzug commented 6 years ago

I used Bridget's html sample to test if the extension is able to detect the supported languages when different from the default. I set Latin as default language and the extension could correctly detect Greek, Persian, Latin and Arabic on double-click. However, when I enter a latin word on a Persian pop-up (or Greek), with latin set as default language, I got the message that language is Persian, change to Latin. Lookup is sticky to the last language detected on double-click on this specific test file with mixed languages. tested in Chrome.

monzug commented 6 years ago

on Firefox, the Persian language is detected but the lexical data could not be populated. when in the Arabic pop-up, if I enter a latin word such as ferre or mare and click on look-up, it does populated the lookup with latin headline and data from Arabic language. see screenshot

screenshot_2018-08-17 screenshot

irina060981 commented 6 years ago

lookup for latin word with Arabic language I have made an investigation and was surprised to know that it is an answer from Morph-Client image

That's why popup shows recieved data from ara and doesn't show a message for change a language.

irina060981 commented 6 years ago

for persian - I have found some words with lexical data

But it seems to me that there are only definitions here

But anyway - may be it could be helpful for some testing.

monzug commented 6 years ago

see below an example of Persian word خرد for which we show data on double-click in Chrome, but does not work on double-click on Firefox. the same Persian word خرد has definition in Look-up in Firefox.

screenshot_2018-08-21 screenshot 1

screenshot_2018-08-21 screenshot

irina060981 commented 6 years ago

Hello, @monzug ! I have tested this word and it opened on double-click in both browsers - Chrome and Firefox. But I opened it in Chrome first - and it needs near a minute to show data (and remove message ""Lexical data is not available ...) But when I opened it in Firefox next - it shows quickly. May be the source for persian morph analyzer needs too much time to get data for the first time? Could you test it on your side and wait longer than before?

monzug commented 6 years ago

I am testing on version 2.0.2. I can confirm that it does happen in firefox only. I did also wait for few minutes but lexical data does not load. there is usually a message generated when the data is still loading.

irina060981 commented 6 years ago

It is a strange behaviour - I hardly could guess how it could depend on browser. Ok - I will try to do more tests .

balmas commented 6 years ago

ok, various things to comment on here:

  1. "Lookup is sticky to the last language detected on double-click on this specific test file with mixed languages." This is how we designed it to work. @monzug does it make sense from a user perspective?

  2. Look of mare/ferre and Arabic results: this is because the Arabic parser understands transliterated arabic according to the Buckwalter transliteration scheme. (https://en.wikipedia.org/wiki/Buckwalter_transliteration) so it's a strange coincidence that mare and ferre are actually transliterated arabic words but that's the fun of language I guess.

  3. I'm not sure what's going on with the Persian lookups -- that service is definitely a bit slower and also the Persian lookups behave differently than the other languages, in that we first try to lookup the word in the dictionary, and then pass it through the morphological parser to see if a different lemma is identified. The Persian parser we are currently using is fairly useless. And the dictionary services occasionally have very short downtimes (I've been monitoring them and note that this happens about once or twice per day right now normally lasting just a few seconds. I need to analyze a bit more closely to understand exactly why). So it's possible that (1) this is an effect caused by the dictionary service going offline briefly and/or (2) something in the alternate sequence of the Persian lookups is invoking a Firefox-specific bug.

@irina060981 since you are still investigating this, I'm going to assign the item to you, but if you aren't able to reproduce please reassign back to me. Thanks!

monzug commented 6 years ago

1) I was expecting that if latin is the default language. lookup should be able to recorgnize the word. From info: Alpheios will try to detect the language of the word from the page markup. If it cannot it will use the default language. right now, it does use the default language set in Using language and not from Option. see screenshot screenshot_2018-08-22 screenshot

the page language in Option is set to Latin but the page language in Using language is set to Persian (in this example). It looks like Using language has priority on Option.

2) I found many words that are transliterated arabic words. didn't know!!!!!

3) I tested the test file with mixed language sevral times and at different time I do not think it could be the service going offline.

balmas commented 6 years ago

Re 1: I see your point re the info description. That was written before we added the lookup feature :-) With the stickiness, I was trying to think about how people might use this feature, and I think we really don't know. I think in the majority of cases the user will be working with only a single language anyway. The scenario I was trying to cover was where the user was reading in one language and wanting to lookup words in another -- for example if they were reading Latin and wanted to lookup corresponding words in Greek. it seemed in that case it would be annoying to have to keep selecting the language with each lookup. Maybe it would help if we always showed the language settings dropdown if the language for the lookup differs from the page or the default?

irina060981 commented 6 years ago

@balmas, I tried to reproduce - but I failed - today I have got all persian translations (for almost all words from the test demo page from embedded) in Firefox without any delay! I think the problem doesn't depend on the browser. May be you will be successful to reproduce it?

monzug commented 6 years ago

still does not work for me also for other Persian text. can we move this issue out of this bug?

On Thu, Aug 23, 2018 at 2:08 AM, Склярова Ирина notifications@github.com wrote:

@balmas https://github.com/balmas, I tried to reproduce - but I failed

  • today I have got all persian translations (for almost all words from the test demo page from embedded) in Firefox without any delay! I think the problem doesn't depend on the browser. May be you will be successful to reproduce it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alpheios-project/components/issues/33#issuecomment-415302674, or mute the thread https://github.com/notifications/unsubscribe-auth/AneqOeqJ5vJXYAS5Sqfwav6R5GJc-Nf4ks5uTkbLgaJpZM4Ts6YA .

balmas commented 6 years ago

Yes, that's a good idea. Please go ahead and make a new bug for it. Thanks!

balmas commented 6 years ago

ok, entered #170 for discussion of point 1, and @monzug entered #168 for point 3.

@monzug can we close this issue now?