alpheios-project / documentation

Alpheios Developer Documentation
0 stars 0 forks source link

App Architecture for User Data #9

Closed balmas closed 4 years ago

balmas commented 5 years ago

The following is a proposal for the application architecture design for managing user data The need is to have a way to work with data sources efficiently locally, while keeping data in sync across multiple application instances.

The requirements for the user word-in-context lists are used as the example use case here, but the idea is to develop an architecture which is flexible enough to handle various data types and data sources, and which works across applications (Webextension, Embedded Library, etc)

For example, a user might do lookups on both a mobile device and on the desktop and each should be updating the user's wordlist to add the words as they are looked up. Similar requirements will be in place for user preferences and other sorts of user data.

I'm proposing a design which uses:

dbsync-controller-architecture

In the above diagram, some of the steps are represented as synchronous when they will need to be asynchronous but the basic flow is this:

[001] - [002] Upon application initialize, controllers subscribe to events which interact with data [003] - User requests a word list display by clicking a button on the word list tab [004] - Wordlist Vue component requests Wordlist data from the UIController [004] - UI Controller delegates the request to the WordListController [005] - Wordlist Controller requests data from a UserDataQuery object [006] - UserDataQuery object requests data from the DBSyncController [008] - [020] DbSyncController interacts with remote and local data sources to retrieve and merge data (In there, the assumption is that we might have a ProtectedClientAdapter which knows how to interact with data sources which require authentication. Exact details of that still need to be worked out but the idea is to isolate the business logic around authentication/authorization from that of managing and merging data sources -- in other scenarios the DbSyncAdapter could use the regular ClientAdapter to retrieve data from non-protected sources) [021] DbSyncController returns the fully merged data set to the WordListController [022-023] WordListController instantiates the WordList data model objects and supplies them to the UIController [024] UIController updates the data sent to the WordList view

The WordList can also be updated by events which are not specific requests to the WordList view component. For example, the requirements call for for every word being looked up to be added to the user's word list. In [001] The WordListController subscribes to the MORPH_DATA_READY event which happens upon word lookup. The UIController might also subscribe to a WORDLIST_DATA_READY event which happens whenever WordListData is updated.

[026] User initiates a word lookup [027] UIController requests data from the LexicalQuery [028] LexicalQuery publishes its MORPH_DATA_READY event [029] WordListController receives the MORPH_DATA_READY event, updates the WordList data model object and then initiates a request to the DBSyncController to store the updated data. [030] - [043] The DBSyncController interacts with the remote and local data stores to update the data (In reality the update events would probably be asynchronous but they are shown synchronously in the diagram) [044] WordListController publishes a [WORDLIST_DATA_UPDATE] event [045] UIController receives the [WORDLIST_DATA_UPDATE] event and updates the WordList view accordingly so that when the user accesses it next it is up to date

The DBSyncController could implement different approaches to synchronizing with the the remote data store depending upon where the code is running. If in a PWA, for example, it could use ServiceWorkers and BackgroundSync to queue up requests when the user is offline, options which are not currently available to the Webextension.

balmas commented 5 years ago

@kirlat and @irina060981 please take a look at this and let me know your thoughts and questions. There are a fair number of assumptions in here about the different pieces of the architecture for working with user data and authenticated services, etc which likely also need some discussion.

irina060981 commented 5 years ago

Hello, Bridget and Kirill! I have several questions and thoughts on the described workflow:

1) Where should be placed WordListComponent? It could be placed inside components repo and could be placed in external library (similiar as it is done for inflection games). If separated - it could be united in one repo with WordListController and work with the help of subscribtion to different events and publishing different events.

2) Would userDataQuery be a part of UIController or it will be a separate library with its own UserDataController? If it will bge separated - it could be easily used in any other repo without the need to import components repo.

3) And the same question about DBSyncController.

4) About synchronizing data local IndexedDB and remote server - I have the following thoughts. From the IndexedDB documentation adheres to a same-origin policy and is own for each browser. (https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API/Basic_Concepts_Behind_IndexedDB)

So we will have several instances of IndexedDB for each page (text) it is used, for each browser, for each environment. And the only thing that could connect all of them - user identification data. Also as IndexedDB has storage limits - we should consider remote server's source as a higher priority. Also the same text could be opened in different tabs of the same browser and IndexedDB would have to queue requests from different tabs too.

And from the other hand we should use advantages of having saved locally data. So I suggest to use 2 ways of updating data on a screen (chosen by user) -

5) About protected ClientAdapter, I could be wrong here , but may be it is enough to add additional fetch variant with additional security keys to ClientAdapters library?

6) And about RemoteDataStore - what it would be? As IndexedDB is object-like database and has no SQL - may be it is better to have some NOSQL solution on remote server - like MongoDB?

kirlat commented 5 years ago

Thanks for laying this down! There are several very important decisions we have to make that, I believe, will define how successful our development will be in the future. Because of this, I would like to offer a discussion of several architectural issues that we face. Our architecture will depend on what decisions would we make on these (and other) issues.

A. Working with both webextension and Safari app extension code I've got a good sense of how difficult and time consuming it might be to support two codebases that do pretty much the same thing, but use different technology stacks, even if the difference is not so significant (a different background code). So I think our goal would be to minimize re-implementation of similar code. Since we are using different architectural solutions (embedded lib is a client side code), webextension (client side code in isolated environment and a protected background code), Safari app extension (same as webbextension but instead of background we have an app extension written in Swift), PWA (similar to webextension in a way). We should try to have one piece of authentication/authorizaton code that will work for all clients (if possible, because there are some challenges here). JS seems to be best for this as it's a common denominator for all our clients.

B. If we want to be successful on mobile, we need to think of how to minimize data throughput. Mobile data is still slow, I believe, even in developed countries. So we should: B1. Minimize the number of outside requests by combining several into one, if possible. This allows to avoid concurrent request limit (https://stackoverflow.com/questions/7456325/get-number-of-concurrent-requests-by-browser). Will not be an issue with SPDY/HTTP 2.0, but still worth considering. B2. We should minimize an amount of data passed over the network. If we issue a request to a resource that produces a large response and then use just 10% of info returned, we will be slowing us down a lot. Ideally, we should allow user to specify in request what information does he or she needs exactly and receive only that data back (a la GraphQL or REST with parameters). B3. Once we start to record user data that will multiply an amount of information we store and transfer tremendously. We should use compact data representations whenever possible, maybe in a form of gzip or BJSON. B4. Amount of user data stored can become huge over time. To retrieve it all on the first sync can be very slow. We should split it into chinks, retrieving the most recent data first and showing it to the client, and then obtaining missing chunks quietly in the background. We should consider mechanisms for splitting and combining data in our apps and doing partial data updates.

C. Different security environments. Webextension, Safari app extension, and PWA can considered be trusted (in a way), but embedded lib is not. The library is running in the same environment with other scripts loaded by the page, and any malicious script (loaded by the page unintentionally or injected into it) can get access to all data of the embedded library (please correct me if I'm wrong here). This means we cannot trust embedded lb to store any secrets. This also means a different security architecture for the library and the rest of the apps. Unfortunately, this contradicts (A). Should really think about possible solutions here.

Considering all that it would be tempting to shift some of our current logic to the servers side. Now we, during a lexical query request, retrieve lemma from one source, lemma translations from the other, and then execute several definitions requests. If we shift this to the server we could:

  1. Minimize the number of requests between a client and a server. All resource information will be gathered on a server via its fast connection routes and then be served to the client eliminating unnecessary data so the response will be small.
  2. Cache data on a server to speed things up. We could cache data used more often, like lemma translations or definitions. That would be more effective than to cache data on mobile (how much data would we be able to store on a mobile device with limited memory?)
  3. That would allow to shift authorization logic to the server, where secret data be protected. Once user is authenticated and an authentication token is sent to the server, server can do authorization within its own protected premises. That's very safe.
  4. That would allow to avoid code fragmentation between different implementations for webextension, embeded lib, etc. All business logic will be implemented with one language (preferably JS) and will exist in one instance, on the server.
  5. If we want to change the business logic in some way (let's say we want to replace Whitaker source with something else), we have to change every client (webextension, embed lib, PWA, Safari), retest it, and redistribute to users. But with business logic on a serer, we would need just to update the server side, if API signature remain the same.

Of course, there are several obvious drawbacks to that solution.

  1. Server is a single point of failure. So we have to provide redundancy. That's more effort and more server resources needed.
  2. Server resources cost money (I see this as a major drawback, but maybe I'm wrong). The more users we have, the more we need to pay.
  3. Complexity of implementing a server side solution (but maybe it's not that complex considering we, with business logic on the client, have to implement it several times with variations for different platforms). We could also probably use some existing service solutions where pieces of business logic can be inserted as modules to avoid create our own server from scratch.

I'm not sure what would be the best solution, but it's very tempting, I think, to move business logic somewhere where it can be implemented once and work for (be shared by) all clients (preferably in a protected environment) and where it also can be updated without the need to update each client implementation. Maybe there are other that server-side solutions that could help us to implement this? Maybe we can use service worker to host all our business and authorization logic? It seems to be supported by a both Chrome, FF and Safari in a different extent. It's JS and we can share it between all clients. But I am not 100% sure what limitations we would have there.

What are your thoughts on this? I think we should define our general strategy before going into details. Thanks!

kirlat commented 5 years ago

I agree with @irina060981 that we could probably add authorization token as a parameter of client adapter queries (or add authorization info to queries in some other way). That should be relatively simple. If we want to have it optional, we can probably use a mixin for authorization logic.

Regarding IndexedDB and same origin policy we could probably put the database into a service worker or a background script (if service worker functionality would be adequate for us then it's the best because we can us it in Safari, it seems). So in a content script or an embedded lib, when we need data, we send a message (DOM event probably as a universal solution) to the SW (i.e. service worker), SW will request IndexedDB and remote server if necessary, and respond with data in a response message. This way all data across different tabs will be shared and in sync.

I'm for moving out as much of business logic from a UI Controller as possible. I think the role of a UI Controller should be to coordinate UI elements only and provide their interactions. A business logic should be somewhere else. I like the concept of Queries, so we can probably use data queries the same way we use lexical queries. If that be not enough, we can introduce a specialized data controller.

kirlat commented 5 years ago

Service workers could be ideal to provide a first level of caching for the apps. Can't find any clear info about whether we can use them in webextensions or not. But it also seems to be no info about it being prohibited either ...

balmas commented 5 years ago

Regarding IndexedDB and same origin policy we could probably put the database into a service worker or a background script (if service worker functionality would be adequate for us then it's the best because we can us it in Safari, it seems). So in a content script or an embedded lib, when we need data, we send a message (DOM event probably as a universal solution) to the SW (i.e. service worker), SW will request IndexedDB and remote server if necessary, and respond with data i

Yes, I think we would be able to work with the indexedDb in the background script, so that all content stored by the background script is in the same db, regardless of what page the user was on when they are using the extension. And for the PWA we would use a service worker. (We may be able to use service workers in webextensions but it's not really clear to me. I do know that google has stated of the goals for the the webextension manifest v3 as "Modernizing to align with new web capabilities, such as supporting Service Workers as a new type of background process"

balmas commented 5 years ago

I need to think about the authentication issues with the embedded library vs the PWA. As I articulated in the release scope comments in Slack, for user data storage I am leaning towards using AWS Serverless Stack, which includes AWS API Gateway, AWS Lambda and either AWS DynamoDB or S3 or both. We would use OAuth2 and AuthO's API authorization flows with JWT to protect access to the AWS API gateway for user data storage/retrieval.

I believe I want to to stick with a microservices approach and client-side authentication. Some links that might be helpful here:

https://yos.io/2017/09/03/serverless-authentication-with-jwt/ https://auth0.com/blog/building-serverless-apps-with-aws-lambda/ https://blog.codecentric.de/en/2018/04/aws-lambda-authorizer/ https://serverless.com/blog/strategies-implementing-user-authentication-serverless-applications/ https://medium.com/@gauravve/service-to-service-authentication-using-auth0-and-serverless-framework-825c45852dbe

balmas commented 5 years ago

Where should be placed WordListComponent? It could be placed inside components repo and could be placed in external library (similiar as it is done for inflection games). If separated - it could be united in one repo with WordListController and work with the help of subscribtion to different events and publishing different events.

I was thinking that the WordListController and WordListComponent (as well as a WordListItem Component) would go in the components repository. WordList and WordListItem would be data model objects in the data-models repository. We could start development with them in a separate repository, but per our refactoring goals, we are trying to reduce the number of dependencies. Plus I think for any Alpheios application the wordlist is a core component.

balmas commented 5 years ago

Would userDataQuery be a part of UIController or it will be a separate library with its own UserDataController? If it will bge separated - it could be easily used in any other repo without the need to import components repo. And the same question about DBSyncController.

Certainly UserDataQuery and DBSyncController are separate from UIController. Whether they belongs in core components is a little less clear to me. It depends in part, I think, on whether we can make this functionality available to the embedded library in a secure way or not.

balmas commented 5 years ago

The questions about combining server requests and optimizing data syncing all require a little more thought. Will try to respond further on these soon.

irina060981 commented 5 years ago

Hello, Bridget and Kirill! My thoughts:

1) I am not very experienced with Service Workers, but as I know they could be used only inside https environment and Safari doesn't support Service Workers yet ( source ) So it seems to me that it couldn't be use in webextension (as pages could be both http/https) and the same is for embedded-lib. But for PWA (not Safari) it could be easily used. So may be using Service Workers is not yet a solution that allows to reduce similiar code for various platforms.

2) About mobile support - I think that there is one more complex problem here (than huge network traffic and big caching data amount). Mobile browsers have less supported possibilities and there are much more variations for them. And it seems to me that the amount of efforts of creating client-server solution could be similiar to creating mobile applications for working with it.

Client side solution (as we have now) has some advantages for desktop usage: 1) it is less dependent on simultaneous number of users (because calculations are made in browser and not server) 2) it has an opportunity to have offline mode

May be it could be useful to create a light version for mobile (with special sign - for mobile) - because if someone tries to use it with poor connection - he could choose to get only morph-data (for example) and use normaly?

If to be honest in my practice I had much experience with classic client-server architecture (like Kirill suggested) and thought that it is the only good way. And I had experience with problems with server overload and constant growth of upgrade costs. And first when I saw this client-side implementation I was surprised. And now I could see advantages of this approach. Because it seems to me that if Alpheios Extension would be used in study process and for example a whole class starts to use it at the same time - it won't be very easy to server. But may be I have bad experience with poor servers :)

kirlat commented 5 years ago

Irina, agree with everything you said!

There is no ideal solution here, and every approach will probably have it's advantages and drawbacks. I like client-based architecture better for what we do (as I understand you are 🙂), but I see some potential issues with it that we may face later. So I thought if we discuss it now, we can probably find some approaches to make it more bearable. Even if there is no solution, we would still keep those issues in mind while writing our code, and it will help us to create a better one, I believe. Once we fully aware of the problems we can try to minimize their consequences.

With what I've learned so far we would probably still have to manage at least three versions of an authentication code (webextension+PWA/Safari/embedded lib) (sigh 😞)

irina060981 commented 5 years ago

I love our discussions and consider them to be a very important part of our workflow :) About authentcation process - I have not so much experience here with web-extension.

But I think that approach with JWT tokens (thank you, Bridget, for links - I have used tokens before but hadn't ever read such a clear description as in the first link) could be very helpful.

About using security issues here - there is not very "fresh" article about security questions. But it could be helpful - it suggests to use Chrome Identity API. If I understood correctly the basics of this technology, it could be helpful both in web-extension (not sure for Safari) and embedded-lib.

From the article:

Chrome API provides a chrome.identity service, which provides a secure way for an extension to authenticate, fetch and refresh tokens. This API enables a user to perform authentication against a third-party service. Chrome can interactively display a popup UI, which:

Can store cookies and session information Is protected against any script injection, even by other Chrome extensions Each Chrome extension has its own chrome.identity instance, and is only accessible by the Chrome extension owning that instance. This makes the token private, even from other malicious Chrome extensions.

Do you have experience with it? If it works well - it seems to me that we could create a library for authentication workflow that won't be very different for (webextension+PWA/Safari/embedded lib) What do you think?

kirlat commented 5 years ago

I am using chrome.identity API in the prototype of authentication functionality. It's working well so far, and it's probably the only browser native API solution available. It has a browser namespace counterpart which I hope will work in FF well.

But Identity API works only in background-related pages, not in the client-side scripts. So it's not an option for an embedded lib (it has to use a different authentication workflow anyway, more on that later). And for Safari it's a no go too 🙁.

However, there is more to it: encryption libraries that we use to generate items of our requests (like random byte array generators and hash functions). Those libraries tend to be environment-specific too 😢.

kirlat commented 5 years ago

Some refs: Since we'll be using Auth0, here are some pieces of Auth0 documentation:

For webextension (all browsers) and PWA we'll probably use what is called "Authorization Code Grant Flow with PKCE": https://auth0.com/docs/api-auth/grant/authorization-code-pkce

For embedded lib (since it cannot be trusted and we cannot store secrets in a client-side script) the best choice is "Implicit Grant Flow": https://auth0.com/docs/api-auth/grant/implicit

And the authentication/authorization code in Safari has probably be within the app extension, which means a different codebase.

irina060981 commented 5 years ago

Thank you, Kirill, for explanations! I think it is a javascript-world and all its advantages and disadvantages :-)

irina060981 commented 5 years ago

For MacOS application - I think it needs this https://github.com/auth0/Auth0.swift

And it is a new challenge for making Safari App Extension a next state of the art, I think. 🙂

balmas commented 5 years ago

Some additional thoughts based upon our discussion at today's check-in:

Whether or not it ends up being possible to avoid cross-domain restrictions on indexed db for the webextension, for the embedded library and reader applications we know we will have cross-domain restrictions. So the design has to take that into account.

Since we want to support a single user account across multiple applications (webextension, mobile reader, etc) the remote user data store is the location which be the authoritative source of the user data.

The IndexedDb can be used as a local cache to support fast and offline access but it will always need to be updated from the remote user data store in order to provide a fully up-to-date view of the user's data.

We must have an API that protects us from the need to duplicating the business logic around retrieving remote data and merging it with the local indexeddb. Any client side feature, such as a word list, should not need to know the details of where the data is coming from. This is the point of the DBSyncController in the above proposed design.

While we can store entire complete Homonym (or other alpheios data-model) objects in the user data stores (both remote and local) and may decide to do so in some cases for performance reasons or to support offline access, the main purpose of the user data store is to store information that is unique to an individual user's experience with the Alpheios applications. We probably do not want to be duplicating data that comes from our remote services across each and every user data store, of which, in the case of the local indexeddb, there could be multiple for each domain the user visits.

Storing the data in structures that can be directly serialized to/from the Alpheios Data Model objects is appealing but if we do this we need to have a way to easily identify the state of that data model object and whether or not it can or needs to be filled in with data from remote services.

balmas commented 5 years ago

It might also be that the persistent structure of a user data object is a subset of what is stored in the local indexed db. The DBSyncController might be responsible for deciding which properties of an Alpheios Data Model object to populate from where.

The DBSyncController could then also implement ClientAdapter interfaces so that it can be used as a source for LexicalQuery data.

For example, I could see a scenario like the following:

With a fresh start:

WordListController requests WordList from DbSyncController DbSyncController retrieves WordList from RemoteDb DbSyncController stores WordList to LocalDb

At his point, the WordList data in the LocalDB is identical to that in the RemoteDB

Then, the user clicks on a Word on the Wordlist, and the UI initiates a LexicalQuery

LexicalQuery asks DbSyncController for the word DBSyncController finds the word in the LocalDB WordList data and returns the Homonym LexicalQuery checks the Homonym isComplete flag. It is false so LexicalQuery continues to proceed with the query as normal

Upon completion of the LexicalQuery

WordListController updates the WordListItem with the full Homonym WordListController calls DBSyncController.updateData(WordListItem) DBSyncController sends WordListItem with partial Homonym to RemoteDB DBSyncController sends WordListItem with full Homonym to LocalDB

Later user clicks on the a word which is in the WordList and which already has a full Homonym stored for it in the local LocalDB, and the UI initiates a LexicalQuery

LexicalQuery asks DbSyncController for the word DBSyncController finds the word in the LocalDB WordList data and returns the Homonym LexicalQuery checks the Homonym isComplete flag. It is true so LexicalQuery issues the various events to indicate that the Homonym is available.

Although the need to support versioning of service results is probably a lower priority, we could add additional business logic into the both the DBSyncController and the LeixcalQuery to check version flags on a Homonym's component parts against service output to find out of the local store needs to be updated. But if the local indexeddb is understood to be a temporary, incomplete storage, and the RemoteDB doesn't store full Homonym data, then this is maybe less of a concern.

@kirlat and @irina060981 does this make sense to both of you? What potential pitfalls do you see in it?

balmas commented 5 years ago

as an alternative/addendum to this statement:

The DBSyncController might be responsible for deciding which properties of an Alpheios Data Model object to populate from where.

I could see the code getting messy if the DBSyncController has to know too much about the individual data model objects. So an alternative might be to have objects which are candidates for remote storage implement a toPersistentJSON method, or the like, which could be used to create the minimal version for remote storage, and set an inComplete flag on those components of it which are not full representations.

balmas commented 5 years ago

Another thing to think about:

All user data objects should probably be versioned themselves, so that we can deal gracefully with future data structure changes. E..g. so that if need be, we can quickly differentiate between a WordListItem version 1.0 and WordListItem version 2.0 without having to examine the datastructure.

kirlat commented 5 years ago

All user data objects should probably be versioned themselves, so that we can deal gracefully with future data structure changes. E..g. so that if need be, we can quickly differentiate between a WordListItem version 1.0 and WordListItem version 2.0 without having to examine the datastructure.

👍 for data versioning. It might also be beneficial to version the REST API of remote services. If we use GraphQL we wont need this as they suggest introducing new fields as a preferred way of versioning: https://graphql.org/learn/best-practices/#versioning.

For versioning of a JS objects such as WordListItem it would probably be cleaner to integrate version info into their class names (i.e. have a separate class for each new version) rather than having a version field inside a class and some conditional logic in methods that will rely on the version filed value. The latter can become convoluted easily. What do you think?

For comparing data objects, there is a hash-object library (https://github.com/puleos/object-hash) that computes hashes for JS objects. I have not used it personally, but maybe this approach could have some benefits in some situations.

irina060981 commented 5 years ago

I have some thoughts here too.

We have different data to arrange locally and remotely:

1) homonym data (got from different remote requests, user couldn't change it) 2) context and user data applied to the word (it is created locally - when user assign/remove important flag, some notes and selects the word from some context, that should be saved) 3) usage examples of the words (the same as previous - some context for the word, but it is uploaded from remote and couldn't be changed). 4) user data for authentication - userID (it is uploaded from remote)

We have several storages that should be synchronized somehow: 1) Remote User Data Storage (it should store the whole data for the user's wordlist among webextension, embed-lib, pwa and something else) - if I understood Bridget here correctly 2) Local IndexedDB (it stores some part of the data or full data, according to user's preferences) 3) Vuex storage (it stores only the data that should be placed in UI) 4) UI data (the data that is visible in UI in fact) 5) Remote services (morph, lexical, usage examples, translations, authentication and may be something else)

All 5 items could change data in first 4 items (according to previous list)

I think that we need here some central data Controller (maybe DBSync or maybe simply DataSync), that will have some rules to sync data using the following conditions: 1) user preferences 2) online/offline mode 3) data part status (correct or outdated) 4) mobile/desktop mode

And it should have access to Remote UserDatabase API, to IndexedDB methods, to Vuex data update events - in both ways - write/read

And it should be able to imported to content part or to background part.

And we couldn't define obvious priority for remote or local data, because some data has source remotely, some part of the data has source locally.

I agree, such controller could became really long codded file/environment.

irina060981 commented 5 years ago

And I think such DataController should be used inside LexicalQiuery

Similiar to Bridget's proposal:

Each updated part of the data - creates/updates wordItem instance inside wordList (with current data of context for the current Lexical request) instance and it uploads to Vuex (from Vuex to UI components) and resaved with updated data to local and to remote. Or may be it returns data to Lexical request and it send it to Vuex/UIController and WordListController

And on first page load we need to upload current wordlist (after authorization) It could the following workflow: 1) Sends request to DataSync 2) it checks Remote User DB (for example by last-update dt) and compares to local IndexedDB 3) Uploads data from Local if they are the same, or remotely (if not) 4) If uploaded from remotely, than requests merging with local

And when a user changes the data (place important flag for example or adds new context usage) or delete 1) Sends request to DataSync 2) It requests changes to Remote and to Local

Also user could remove some worditem similiar to previous scenario

I think there could be very different scenarios that could be implemented one by one .

irina060981 commented 5 years ago

I think we should divide all sync procedures by type of the data (similiar to Lexical request): 1) user data 2) morph 3) lexical 4) translation 5) usage example 6) context data 7) additional data - important, session flag, some notices

irina060981 commented 5 years ago

And create sync rules for each one according to

user preferences online/offline mode data part status (correct or outdated) mobile/desktop mode

inside some Controller

balmas commented 5 years ago

For versioning of a JS objects such as WordListItem it would probably be cleaner to integrate version info into their class names (i.e. have a separate class for each new version) rather than having a version field inside a class and some conditional logic in methods that will rely on the version filed value. The latter can become convoluted easily. What do you think?

I am not sure how I feel about that. I guess another option here is to use Protocol Buffers (https://codeclimate.com/blog/choose-protocol-buffers/). It sounds like the solution they provide to data versioning issues in service interactions is similar to that of the GraphQL approach - I i.e by relying on ionly adding not removing or changing fields. As this came up for me while thinking about the exchange of data to/from the CRUD microservice for the remoteDb for the wordlists, it seems that maybe that is the problem Protocol Buffers were designed to address. Do either of you have experience with them?

balmas commented 5 years ago

I have some thoughts here too.

We have different data to arrange locally and remotely:

  1. homonym data (got from different remote requests, user couldn't change it)
  2. context and user data applied to the word (it is created locally - when user assign/remove important flag, some notes and selects the word from some context, that should be saved)
  3. usage examples of the words (the same as previous - some context for the word, but it is uploaded from remote and couldn't be changed).
  4. user data for authentication - userID (it is uploaded from remote)

We have several storages that should be synchronized somehow:

  1. Remote User Data Storage (it should store the whole data for the user's wordlist among webextension, embed-lib, pwa and something else) - if I understood Bridget here correctly
  2. Local IndexedDB (it stores some part of the data or full data, according to user's preferences)
  3. Vuex storage (it stores only the data that should be placed in UI)
  4. UI data (the data that is visible in UI in fact)
  5. Remote services (morph, lexical, usage examples, translations, authentication and may be something else)

All 5 items could change data in first 4 items (according to previous list)

I think that we need here some central data Controller (maybe DBSync or maybe simply DataSync), that will have some rules to sync data using the following conditions:

  1. user preferences
  2. online/offline mode
  3. data part status (correct or outdated)
  4. mobile/desktop mode

And it should have access to Remote UserDatabase API, to IndexedDB methods, to Vuex data update events - in both ways - write/read

And it should be able to imported to content part or to background part.

And we couldn't define obvious priority for remote or local data, because some data has source remotely, some part of the data has source locally.

I agree, such controller could became really long codded file/environment.

There are some very good points here, and I think we need to be careful about the scope of this data controller, and limit it to persistent data accesses and not involve it in application state data. If we assume that for all persistent storage (including the local indexed db solution in "persistent" even if that is debatable) we will require user authentication, then we could call it UserDataSyncController or something like that.

The the point about being able to be imported to content or background, I will copy what I just put in the slack discussion here:

For the different interface to IndexedDb in the Webextension and the EmbedLib, ideally I think this should work similar to that which we have already discussed needing for the Auth object. I.e. we need an abstraction that allows the rest of the application to not care if this is happening in the background or the content side, and then an implementation of that abstraction that gets handed to the UIController's constructor

kirlat commented 5 years ago

I am not sure how I feel about that. I guess another option here is to use Protocol Buffers (https://codeclimate.com/blog/choose-protocol-buffers/). It sounds like the solution they provide to data versioning issues in service interactions is similar to that of the GraphQL approach - I i.e by relying on ionly adding not removing or changing fields. As this came up for me while thinking about the exchange of data to/from the CRUD microservice for the remoteDb for the wordlists, it seems that maybe that is the problem Protocol Buffers were designed to address. Do either of you have experience with them?

I have not worked with Protobuf, but heard good things about them. I think they should be nearly ideal for inter-service communications.

I probably misunderstood your point about WordListItem versioning. I was thinking you was talking about versioning it for using within an application (i.e. that we might have some modules/components that were using both V1 and V2 of it at the same time), not about transferring it over the network. 🙂

I think protobuf might be beneficial for storing data too, in some situations.

kirlat commented 5 years ago

There are some very good points here, and I think we need to be careful about the scope of this data controller, and limit it to persistent data accesses and not involve it in application state data. If we assume that for all persistent storage (including the local indexed db solution in "persistent" even if that is debatable) we will require user authentication, then we could call it UserDataSyncController or something like that.

It is important, on my opinion, that we won't end up with a huge do-it-all data controller as it might grow into something that is hard to maintain. To avoid this we probably should:

  1. Clearly define responsibilities of data controller(s) and be vigilant not to expand beyond those boundaries.
  2. If we end up with those responsibilities having a wide span, we should separate the controller into several modules. For example, we could have a persistence module that will be responsible for storing data, and it could have IndexDB and remote storage sub-modules. Merging data can have some non-trivial logic and can probably be separated into its own module too. So the whole thing would be a combination of small and specialized modules. Such modules are easier to upgrade and test.
balmas commented 5 years ago

I probably misunderstood your point about WordListItem versioning. I was thinking you was talking about versioning it for using within an application (i.e. that we might have some modules/components that were using both V1 and V2 of it at the same time), not about transferring it over the network.

Ah yes, sorry I wasn't clear about that. I don't think that a single version of the application should be actively trying to save multiple versions at the same time, but it might need to be able to read older versions. That is, a newer version of the application shouldn't break if it encounters data that was saved by an older version.

balmas commented 5 years ago

It is important, on my opinion, that we won't end up with a huge do-it-all data controller as it might grow into something that is hard to maintain. To avoid this we probably should:

  1. Clearly define responsibilities of data controller(s) and be vigilant not to expand beyond those boundaries.
  2. If we end up with those responsibilities having a wide span, we should separate the controller into several modules. For example, we could have a persistence module that will be responsible for storing data, and it could have IndexDB and remote storage sub-modules. Merging data can have some non-trivial logic and can probably be separated into its own module too. So the whole thing would be a combination of small and specialized modules. Such modules are easier to upgrade and test.

Agree with these points.

balmas commented 4 years ago

this was implemented in the 3.0 release. Future work on user data management will be discussed separately.