alpheios-project / documentation

Alpheios Developer Documentation
0 stars 0 forks source link

Session History/Multiple Popup #37

Open balmas opened 3 years ago

balmas commented 3 years ago

This issue is to discuss the design for the following requirements:

(moved from #33)

balmas commented 3 years ago

As we work through the design, I'd like us to keep in mind that we can leverage browser caching of remote HTTP requests/responses so that not all query results need to be kept in memory for each word in the history.

balmas commented 3 years ago

Per @irina060981 we need to define the following requirements for session history:

balmas commented 3 years ago

I'd like to offer this as a first draft of a revised data model for discussion.

reviseddatamodel_word (2)

The model objects shown in a contained-by relationship (with the filled diamond) cannot exist outside of the context of their container. Those shown as a aggregated-by relationship (with the un-filled diamond) can exist as data objects separate from their container but are still directly part of model object (even if only by reference). And those that are shown with a directed relationship (the dotted line arrow) can exist as data objects separate from the related object, and may be accessed through the data model object but are not contained by or belonging to it.

So, for example, a WordListItem (renamed from WordItem) aggregates Word instances that were created in in the current and prior sessions (I think we probably need to revise the structure of WordListItem too, but that's more detail than I'm trying to show here).

Anytime a user looks up a word, either by clicking on it on the page or from an alpheios display, or in the lookup box, a Word object is created if it doesn't already exist. A Word may be identified as unique by the combination if its text and and its context.

The SessionHistory contains all instances of Word objects created during a session. I suppose it's possible that a Word is persisted outside of the current session in addition to and separate from its aggregation in the WordListItem, but I think this might be more complexity that we really need or want, so I'm using contained-by here.

I'm showing a general concept of a ComponentState with a reference to 0 or more Word items to indicate that Words move in and out of components separately from their lifetime in the session. The component (e.g. popup, or a treebank diagram, inflection table etc.) may reference one or more Word objects when a word is selected for viewing, but the Word persists outside of its temporary presence in a component. ( I also don't necessarily intend to say that there is a model object called ComponentState, I think we still would Vuex for this and each component should have its own state etc, these details are outside the scope of what I'm trying to show here).

A Word contains up to 1 TextQuoteSelector (context) and up to 1 HomonynGroup. If there is no TextQuoteSelector then it was a word looked up without context (e.g. in the lookup box). And if there is no HomonymGroup then we have no information about it yet other than the user selection.

Both TextQuoteSelector and HomonymGroup are specific to a single instance of a Word. So these are in an contained-by relationship with a word.

A HomonymGroup is populated with Homonyms for a word by the execution of the Lexical Query. (I.e. use cases 8-10 at https://github.com/alpheios-project/documentation/issues/33#issuecomment-670098953). I have reflected this relationship between a HomonymGroup and its Homonym objects as a directed relationship, and not containment or aggregation because I think it could be that a Homonym is persisted (whether in memory, or in IndexedDb) separately from a Word as it's the result of a query that could be executed in different contexts. It's really the properties of the query that determine the uniqueness of a Homonym and different words in different contexts could produce identical queries. So, with a GraphQL implementation, it would be up to the query API to determine whether or not it needed to create the Homonym object from scratch or not (and it might need to create parts but not all of it from scratch).

Everything shown in the containment hierarchy with a Homonym would be populated by the result of the Lexical Query. The main difference here form the current model is that I have removed the Full Definitions from the Lexeme.

Full Definitions, Usage Examples, Word Translations, InflectionViewSets ( and other resources such as Linked Resources) are distinct from the context of a specific Homonym and are retrieved by reference using properties of the Lexeme or the Word but are not contained directly by either.

@kirlat and @irina060981 please let me know your thoughts on this.

kirlat commented 3 years ago

What if we try to break complexity down into smaller, more controllable pieces?

I think we can identify the following types of resources relatively independent from each other:

All those data items could be stored and retrieved independently. Each piece of resource data is relatively simple by itself. A data item, or several data items, can be obtained with GraphQL queries from a service that is responsible for obtaining and keeping this data for a certain period of time. Let's call it the Resource Service. It could be a GraphQL server or something else, I want to keep implementation details out of scope of this discussion.

There could be several Resource Services each responsible for single type of resources. We could also have a single service that will be able to retrieve all types of resources. In the latter case a GraphQL query will specify what types of resource have to be obtained. The latter case, however, may have a hard-to-solve problem with different retrieval times for different types of data (i.e. the whole request may wait for the slowest resource data to arrive and that will make user to wait for results in limbo for too long).

Let's call instances of individual resource type items the Resource Objects.

There are also grouping entities. Let's call them the Grouping Objects:

And we also have two lists (the Lists):

So we have three groups of objects (the Resource Objects, the Grouping Object, and the Lists) with a slightly different role each. The purpose of Resource Objects is to keep data. They would most likely have almost no business logic. Nor would they have any knowledge of each other.

The Grouping Objects holds references to Resource Objects or other Grouping Objects. They keep almost no data of their own other than lists of resources or other grouping objects that comprises them. They, however, may have business logic. For example, the Homonym Object may have a method to return all the Inflections kept within the Lexemes it holds.

The interesting thing about the Grouping Objects is that they can have not all the Resource Objects all the time. Yet the Grouping Objects can be useful (and be used) even in such "incomplete" state. For example, a homonym may have lexemes with lemmas only, and those lemmas may have only short definitions. Yet a Homonym object in a state like that can still be shown in a popup. It may retrieve full definitions later. It may obtain translations later too, if user has enabled a lemma translation options in the settings. Because data inside such objects are not in one constant state but may change over time, we can call them the Live Objects.

When the user selects the word on a page, a Lexical Query decides what types of resources are needed initially. This may depend on the context, options, user preferences, and other factors. Upon a word selection a Lexical Query creates a Grouping Object (a Word most likely) and issues GraphQL requests to the Resource Services that corresponds types of data to be obtained (Lemmas and Short Definitions, for example). The Resource Service checks if data is available locally and if not, retrieves it remotely. It then returns data back to the Lexical Query. Once data is arrived to the Lexical Query, the Grouping Object (the Word) is populated and put into use (i.e. displayed to the user).

The Word object will not have the full data set at that stage. It will not have Full Definitions data, for example. This data can be obtained in two ways:

So the Live Objects would change their state depending on user requests for data (like when user requests full definitions). The Live Objects may also react to other changes. If user chose to disable lemma translations in options, the Lexeme may drop lemma translations. If user later decides to enable translations again, the Live Object may request it from the Translations Service in order to re-populate itself. If the Translation Service has kept this data in cache all this time, it will return it immediately. Otherwise, data will be deferred until received from the remote serer.

So that's the concept I suggest to explore and discuss. I think it can be beneficial because it simplifies things by reducing inter-dependencies. I also think that it offers an expandable model because we can add new types of resources easily, with minimal change to the model. All pieces of the model are pretty well isolated from each other which allows to update and test them easily.

@balmas, @irina060981, what do you think about a model like that?

balmas commented 3 years ago

I think this is a very helpful way to model things. Having trouble finding a flaw in it. I wonder if there is a way we could try it out in a small way before refactoring everything to see if it works?

kirlat commented 3 years ago

I wonder if there is a way we could try it out in a small way before refactoring everything to see if it works?

I think the safest way to verify feasibility is to start introducing the new workflow in parallel with the existing one. We can create a new version of a lexical request object (or add new methods to the existing one) and, initially, run the request through both old and new objects in parallel. With this we can verify that the data retrieved via the new workflow is the same as obtained through the old one. Then we can start switching UI components one by one to get data from the new workflow. After this change is complete, we can start to remove pieces of the old workflow. What do you think?

balmas commented 3 years ago

I was wondering if we could start even smaller than that, to test out the concept of the model on a very discrete piece of functionality. I'm not sure if it's feasible.

kirlat commented 3 years ago

The other change that the history would require is a creation of a history object that will keep a history of all lexical requests.

Right now, if the homonym is retrieved by the lexical query, it is saved to the prop of the app controller, inside the onHomonymReady() CB. The UI components use app controller's methods to retrieve the homonym or it's various parts and then display this data to the user. For example, the morph Vue component uses getHomonymLexemes() to retrieve lexemes to display.

In order to satisfy requirements from this issue we could create a History object that would hold results of all lexical queries we would be keeping. We could create WordQueryData objects for that. The History object would essentially be a list of WordQueryData instances.

Once the lexical data for a word is retrieved, a new WordQueryData is added to the History list. The UI components would have access to the History object and will be able to retrieve lexical data of the latest, or of any previous request. In light of #40 requirements we have to create Word/Homonym objects differently, depending on user preferences. So the WordQueryData will have methods that will take user options as a parameter and will return lexical objects constructed according to user preferences.

Thus, via the History object, UI components will have access to any lexical query results that are stored in history.

Items inside the WordList would be populated from the History queue, according to the settings (as not all words queried may make it to the WordList).

What do you think about the changes like that?

balmas commented 3 years ago

Above I proposed

The SessionHistory contains all instances of Word objects created during a session. I suppose it's possible that a Word is persisted outside of the current session in addition to and separate from its aggregation in the WordListItem, but I think this might be more complexity that we really need or want, so I'm using contained-by here.

What do you see as the difference between a Word object (as discussed above and also at https://github.com/alpheios-project/documentation/issues/40#issuecomment-754801509) and a WordQueryData object? Are these the same thing?

kirlat commented 3 years ago

What do you see as the difference between a Word object (as discussed above and also at #40 (comment)) and a WordQueryData object? Are these the same thing?

I am not completely sure but I was thinking about @irina060981 words of data model objects being reused in other applications. I'm also thinking about the separation of concerns.

Whatever object representing the lexical query results is saved in a history list it should be able to return the query results in different forms. It must have various methods for that. In order to do so, it probably has to keep some GrahpQL response data in the "raw" form inside.

If we decide to introduce the Word as a top-level lexical object in the hierarchy we would probably want it to be reused. The best place for it then would be the data models repository. However, in that case it should contain no business logic related to data interpretation and construction of a specific shape of lexical data from the response data. This logic probably belongs to components, or maybe even to a separate package where we could split all logic related to lexical data processing (not sure if we need it now but it might make sense). Then we need something on top of the Word containing the query data and being able to produce different word forms.

But if we don't want the Word object to reside in data models, then we can put that logic to the Word class and use it instead.

That's my reasoning behind having a separate WordData object. What do you think about it?

balmas commented 3 years ago

To be honest, I'm not sure. Overall I agree that we need a place to store and manage the session history and a place to store and manage accumulated word data. Whether we need both a Word model and a Word data object remains to be seen I think.

I suggest we proceed and see how it goes.