Open balmas opened 3 years ago
As we work through the design, I'd like us to keep in mind that we can leverage browser caching of remote HTTP requests/responses so that not all query results need to be kept in memory for each word in the history.
Per @irina060981 we need to define the following requirements for session history:
I'd like to offer this as a first draft of a revised data model for discussion.
The model objects shown in a contained-by relationship (with the filled diamond) cannot exist outside of the context of their container. Those shown as a aggregated-by relationship (with the un-filled diamond) can exist as data objects separate from their container but are still directly part of model object (even if only by reference). And those that are shown with a directed relationship (the dotted line arrow) can exist as data objects separate from the related object, and may be accessed through the data model object but are not contained by or belonging to it.
So, for example, a WordListItem
(renamed from WordItem
) aggregates Word
instances that were created in in the current and prior sessions (I think we probably need to revise the structure of WordListItem too, but that's more detail than I'm trying to show here).
Anytime a user looks up a word, either by clicking on it on the page or from an alpheios display, or in the lookup box, a Word
object is created if it doesn't already exist. A Word
may be identified as unique by the combination if its text and and its context.
The SessionHistory
contains all instances of Word
objects created during a session. I suppose it's possible that a Word
is persisted outside of the current session in addition to and separate from its aggregation in the WordListItem
, but I think this might be more complexity that we really need or want, so I'm using contained-by here.
I'm showing a general concept of a ComponentState
with a reference to 0 or more Word
items to indicate that Word
s move in and out of components separately from their lifetime in the session. The component (e.g. popup, or a treebank diagram, inflection table etc.) may reference one or more Word
objects when a word is selected for viewing, but the Word
persists outside of its temporary presence in a component. ( I also don't necessarily intend to say that there is a model object called ComponentState, I think we still would Vuex for this and each component should have its own state etc, these details are outside the scope of what I'm trying to show here).
A Word
contains up to 1 TextQuoteSelector
(context) and up to 1 HomonynGroup
. If there is no TextQuoteSelector then it was a word looked up without context (e.g. in the lookup box). And if there is no HomonymGroup
then we have no information about it yet other than the user selection.
Both TextQuoteSelector
and HomonymGroup
are specific to a single instance of a Word
. So these are in an contained-by relationship with a word.
A HomonymGroup
is populated with Homonym
s for a word by the execution of the Lexical Query. (I.e. use cases 8-10 at https://github.com/alpheios-project/documentation/issues/33#issuecomment-670098953). I have reflected this relationship between a HomonymGroup
and its Homonym
objects as a directed relationship, and not containment or aggregation because I think it could be that a Homonym
is persisted (whether in memory, or in IndexedDb) separately from a Word
as it's the result of a query that could be executed in different contexts. It's really the properties of the query that determine the uniqueness of a Homonym and different words in different contexts could produce identical queries. So, with a GraphQL implementation, it would be up to the query API to determine whether or not it needed to create the Homonym object from scratch or not (and it might need to create parts but not all of it from scratch).
Everything shown in the containment hierarchy with a Homonym
would be populated by the result of the Lexical Query. The main difference here form the current model is that I have removed the Full Definitions from the Lexeme
.
Full Definitions, Usage Examples, Word Translations, InflectionViewSets ( and other resources such as Linked Resources) are distinct from the context of a specific Homonym
and are retrieved by reference using properties of the Lexeme
or the Word
but are not contained directly by either.
@kirlat and @irina060981 please let me know your thoughts on this.
What if we try to break complexity down into smaller, more controllable pieces?
I think we can identify the following types of resources relatively independent from each other:
All those data items could be stored and retrieved independently. Each piece of resource data is relatively simple by itself. A data item, or several data items, can be obtained with GraphQL queries from a service that is responsible for obtaining and keeping this data for a certain period of time. Let's call it the Resource Service. It could be a GraphQL server or something else, I want to keep implementation details out of scope of this discussion.
There could be several Resource Services each responsible for single type of resources. We could also have a single service that will be able to retrieve all types of resources. In the latter case a GraphQL query will specify what types of resource have to be obtained. The latter case, however, may have a hard-to-solve problem with different retrieval times for different types of data (i.e. the whole request may wait for the slowest resource data to arrive and that will make user to wait for results in limbo for too long).
Let's call instances of individual resource type items the Resource Objects.
There are also grouping entities. Let's call them the Grouping Objects:
And we also have two lists (the Lists):
So we have three groups of objects (the Resource Objects, the Grouping Object, and the Lists) with a slightly different role each. The purpose of Resource Objects is to keep data. They would most likely have almost no business logic. Nor would they have any knowledge of each other.
The Grouping Objects holds references to Resource Objects or other Grouping Objects. They keep almost no data of their own other than lists of resources or other grouping objects that comprises them. They, however, may have business logic. For example, the Homonym Object may have a method to return all the Inflections kept within the Lexemes it holds.
The interesting thing about the Grouping Objects is that they can have not all the Resource Objects all the time. Yet the Grouping Objects can be useful (and be used) even in such "incomplete" state. For example, a homonym may have lexemes with lemmas only, and those lemmas may have only short definitions. Yet a Homonym object in a state like that can still be shown in a popup. It may retrieve full definitions later. It may obtain translations later too, if user has enabled a lemma translation options in the settings. Because data inside such objects are not in one constant state but may change over time, we can call them the Live Objects.
When the user selects the word on a page, a Lexical Query decides what types of resources are needed initially. This may depend on the context, options, user preferences, and other factors. Upon a word selection a Lexical Query creates a Grouping Object (a Word most likely) and issues GraphQL requests to the Resource Services that corresponds types of data to be obtained (Lemmas and Short Definitions, for example). The Resource Service checks if data is available locally and if not, retrieves it remotely. It then returns data back to the Lexical Query. Once data is arrived to the Lexical Query, the Grouping Object (the Word) is populated and put into use (i.e. displayed to the user).
The Word object will not have the full data set at that stage. It will not have Full Definitions data, for example. This data can be obtained in two ways:
getFullDefinitions()
method that will be called by the UI once a user will open a full definitions tab (let's say we won't be hiding it if full definitions are not retrieved yet). Once called, this method will trigger a request to the Full Definitions Resource Service (or will use a method of the Lexical Query to do so). When data will arrive, the Word Object will populate itself and notify the UI that the data retrieval is complete by returning a designated code to it (like RETRIEVAL_COMPLETE or something similar). The UI will them pull data from the Word object and display it to the user. While data retrieval is in progress (i.e. while the success code has not been recieved), the UI will display a progress indicator.So the Live Objects would change their state depending on user requests for data (like when user requests full definitions). The Live Objects may also react to other changes. If user chose to disable lemma translations in options, the Lexeme may drop lemma translations. If user later decides to enable translations again, the Live Object may request it from the Translations Service in order to re-populate itself. If the Translation Service has kept this data in cache all this time, it will return it immediately. Otherwise, data will be deferred until received from the remote serer.
So that's the concept I suggest to explore and discuss. I think it can be beneficial because it simplifies things by reducing inter-dependencies. I also think that it offers an expandable model because we can add new types of resources easily, with minimal change to the model. All pieces of the model are pretty well isolated from each other which allows to update and test them easily.
@balmas, @irina060981, what do you think about a model like that?
I think this is a very helpful way to model things. Having trouble finding a flaw in it. I wonder if there is a way we could try it out in a small way before refactoring everything to see if it works?
I wonder if there is a way we could try it out in a small way before refactoring everything to see if it works?
I think the safest way to verify feasibility is to start introducing the new workflow in parallel with the existing one. We can create a new version of a lexical request object (or add new methods to the existing one) and, initially, run the request through both old and new objects in parallel. With this we can verify that the data retrieved via the new workflow is the same as obtained through the old one. Then we can start switching UI components one by one to get data from the new workflow. After this change is complete, we can start to remove pieces of the old workflow. What do you think?
I was wondering if we could start even smaller than that, to test out the concept of the model on a very discrete piece of functionality. I'm not sure if it's feasible.
The other change that the history would require is a creation of a history object that will keep a history of all lexical requests.
Right now, if the homonym is retrieved by the lexical query, it is saved to the prop of the app controller, inside the onHomonymReady()
CB. The UI components use app controller's methods to retrieve the homonym or it's various parts and then display this data to the user. For example, the morph
Vue component uses getHomonymLexemes()
to retrieve lexemes to display.
In order to satisfy requirements from this issue we could create a History
object that would hold results of all lexical queries we would be keeping. We could create WordQueryData
objects for that. The History
object would essentially be a list of WordQueryData
instances.
Once the lexical data for a word is retrieved, a new WordQueryData
is added to the History
list. The UI components would have access to the History
object and will be able to retrieve lexical data of the latest, or of any previous request. In light of #40 requirements we have to create Word/Homonym objects differently, depending on user preferences. So the WordQueryData
will have methods that will take user options as a parameter and will return lexical objects constructed according to user preferences.
Thus, via the History
object, UI components will have access to any lexical query results that are stored in history.
Items inside the WordList
would be populated from the History
queue, according to the settings (as not all words queried may make it to the WordList
).
What do you think about the changes like that?
Above I proposed
The SessionHistory contains all instances of Word objects created during a session. I suppose it's possible that a Word is persisted outside of the current session in addition to and separate from its aggregation in the WordListItem, but I think this might be more complexity that we really need or want, so I'm using contained-by here.
What do you see as the difference between a Word
object (as discussed above and also at https://github.com/alpheios-project/documentation/issues/40#issuecomment-754801509) and a WordQueryData
object? Are these the same thing?
What do you see as the difference between a
Word
object (as discussed above and also at #40 (comment)) and aWordQueryData
object? Are these the same thing?
I am not completely sure but I was thinking about @irina060981 words of data model objects being reused in other applications. I'm also thinking about the separation of concerns.
Whatever object representing the lexical query results is saved in a history list it should be able to return the query results in different forms. It must have various methods for that. In order to do so, it probably has to keep some GrahpQL response data in the "raw" form inside.
If we decide to introduce the Word
as a top-level lexical object in the hierarchy we would probably want it to be reused. The best place for it then would be the data models repository. However, in that case it should contain no business logic related to data interpretation and construction of a specific shape of lexical data from the response data. This logic probably belongs to components
, or maybe even to a separate package where we could split all logic related to lexical data processing (not sure if we need it now but it might make sense). Then we need something on top of the Word
containing the query data and being able to produce different word forms.
But if we don't want the Word
object to reside in data models, then we can put that logic to the Word
class and use it instead.
That's my reasoning behind having a separate WordData
object. What do you think about it?
To be honest, I'm not sure. Overall I agree that we need a place to store and manage the session history and a place to store and manage accumulated word data. Whether we need both a Word model and a Word data object remains to be seen I think.
I suggest we proceed and see how it goes.
This issue is to discuss the design for the following requirements:
(moved from #33)