CatimaLoyalty / Android

Catima, a Loyalty Card & Ticket Manager for Android
https://catima.app
GNU General Public License v3.0
870 stars 151 forks source link

Multi-device / sync support #791

Open TheLastProject opened 2 years ago

TheLastProject commented 2 years ago

We need to build and integrate Catima Sync.

I am closing all other issues related to multi-device or syncing support for the following reasons:

  1. To keep a clear overview
  2. Because Google is increasing limitations on filesystem access from apps, so local options with things like Syncthing and Nextcloud are less and less useful for most users
  3. WebDAV does not have sufficient conflict resolution for Catima's file format and it is important conflicts can be solved in-app easily.
IllusiveMan196 commented 2 years ago

Syncthing can't copy files in Android/data and Android/obb. The rest seems to be accessible. Even then it is not or rather should not be an issue on Android versions lower than 11. If Catima saves things into common directory or outside Android folder (as it should and does for me) this isn't really an issue. But if we can get data off the phone somehow I'm not against it as I don't really like indirect vendor lock-ins. Wanna have your data - use our services" style.

tobast commented 2 years ago

Hello, Has any work started on this? I can see a bit of discussion in the issues of the Sync repo, but no code at all. If I can find time, I might be able to work on this.

TheLastProject commented 2 years ago

No work has started on this, I haven't gotten to it and I don't think I will get to this any time soon. This is also quite low on the priority list for myself as I only have one phone.

I basically have the following concerns:

  1. Internet permission: as most users don't need this feature, giving every single Catima install internet access for this feels not a good start for privacy
  2. Hosting: I'm already paying money for the Catima project, and that's fine for a hobby project (not everything needs to make money), but I'm not looking forward to the disaster of dealing with user data without getting paid. So we need a self-hostable solution (also for privacy concerns). I could possibly set up a paid host under my control later if things seem to work fine, but I want to make sure it's easily self-hostable first (privacy-friendly option first, "mass consumer" later).

For WearOS, I have a similar "tainting" issue as number 1 and my plan is to split the syncing into a separate app and communicate with the main Catima app using Android intents (see https://github.com/CatimaLoyalty/Android/issues/25#issuecomment-1175296481). I haven't started on this yet so I'm not sure how viable it is, but a secondary sync app which uses LIST_CARDS, READ_CARD, EDIT_CARD, CREATE_CARD permissions to read and update the cards Catima has seems like a very clean way to go about this. See also https://developer.android.com/guide/topics/permissions/defining.

A secondary concern is that card IDs are currently sequential, so adding from a second device if the first device has been offline for a while could cause conflicts: https://github.com/CatimaLoyalty/Sync/issues/8.

For the Sync repo I was thinking of using something like Django because I personally have experience with it and it has a nice built-in admin control panel and other convenience features, but I don't have very strong opinions for this. The landscape might have moved on and given nice new options since I last built a web app myself.

So basically, this is a very huge task, and help is definitely very welcome! Do you have a plan on how you want to tackle this and where you want to start? I'm thinking a fake app that just reads/edits/injects cards offline at first to communicate with Catima and deal with conflict resolution might be the cleanest start, but I'm quite interested in what you think too.

tobast commented 2 years ago

I also only have one phone, but I see this as a great way to have backups, and maybe even enable cards sharing between multiple users of the same server.

My experience with phone apps development is extremely limited, and last time I used Java was probably around 10 years ago, but I'm experienced with Django and thought I could mostly help there, especially as the expected endpoints are already documented in the issues there.

I completely agree on the self-hosting requirement; actually, I wouldn't be interested in this myself if it was not meant to be self-hostable :) If, in the future, there are plans to host this on an "official" server, some serious privacy concerns would be involved, and I expect this would require some form of encryption. For now, I was considering only the self-hosted model, where the server can usually be considered trustworthy.

As for the ID issue, is the ID actually used extensively? It feels to me like this should not be a problem, and using either a (randomly-generated) uuid4 or, say, some hash (only to map it to an integer) of the card data instead would be enough. I feel like in most cases, the card data is a good identifier, but this would mean that changing the card data (eg if it expires yearly and the user rescans it) would delete and recreate the card behind the scenes, instead of changing the data.

If the goal is to minimize the changes required on the mobile app, I even believe that matching a card by (a hash of) its data could be fine, and different phones could have different IDs in their local DB for the same card. This looks messy and could create weird issues in the future, though.

If it seems good to you, my plan would be to start with a Django app without any interface apart from the django-admin, with :

The conflicts handling looks like the biggest issue, but it seems to me that most of the testing can be done in unit tests instead of creating a mock client -- this would come with the benefits of specifying clearly where user intervention is needed and where the merge can be automatic.

TheLastProject commented 2 years ago

I'm experienced with Django and thought I could mostly help there, especially as the expected endpoints are already documented in the issues there

Help with the Django part sounds great. Please do note that the endpoints I made issues for were my ideas based on (most likely outdated) best practice knowledge. If you feel I'm wrong anywhere, feel free to suggest improvements :)

As for the ID issue, is the ID actually used extensively

The ID is the single most important element. It's the "primary key" in the SQLite database on the Android app. It is used when opening cards from the main screen, when referring to which cards are in a group, when setting and opening a shortcut widget, etc. The ID should never change. However, currently the ID is a numeric autoincrement. I think it may be better to change it to an UNIX timestamp so that the only reason 2 newly created cards on different devices that are both offline could cause a conflict would be if they were created at the exact same second.

If it seems good to you, my plan would be to start with a Django app without any interface apart from the django-admin, with :

  • models,
  • authentication (I believe tokens would be more fitting, to be able to revoke a single phone if eg. it gets stolen)
  • API endpoints

That sounds great :)

The conflicts handling looks like the biggest issue, but it seems to me that most of the testing can be done in unit tests instead of creating a mock client -- this would come with the benefits of specifying clearly where user intervention is needed and where the merge can be automatic.

Having a way of dealing with this will be a requirement before going live but probably doesn't need to be part of an initial prototype. With unit tests, I suppose you mean on the client side? I wonder how to best detect conflicts. My naive method would be to send the old data together with the new data to see if a change was made since starting making this change, but I expect there to be much better solutions.

tobast commented 2 years ago

The ID is the single most important element. It's the "primary key" in the SQLite database on the Android app. It is used when opening cards from the main screen, when referring to which cards are in a group, when setting and opening a shortcut widget, etc. The ID should never change. However, currently the ID is a numeric autoincrement. I think it may be better to change it to an UNIX timestamp so that the only reason 2 newly created cards on different devices that are both offline could cause a conflict would be if they were created at the exact same second.

I think I still prefer the idea of a uuid4. It seems to be exactly the kind of job it was designed to handle, and it also seems to me that it is the most common solution for this kind of problem — distributed IDs. I believe databases handle those seamlessly, too — postgresql has a type for it, SQLite would handle those as BLOBs.

As I see it,

Did you have anything else in mind? What would be your preferred solution?

Having a way of dealing with this will be a requirement before going live but probably doesn't need to be part of an initial prototype. With unit tests, I suppose you mean on the client side? I wonder how to best detect conflicts. My naive method would be to send the old data together with the new data to see if a change was made since starting making this change, but I expect there to be much better solutions.

Yes, that would be client side, I believe. I was wondering whether there is a need to handle merging both on the client and server side, but it seems to me that the good solution to avoid this is to behave somehow like Git, ensuring that the user has pulled the latest version of the database before they try to push any data — otherwise it might just erase a modification that happened in between without knowing.

A way to do this might be to carry an incremental revision id for each card. When updating the card on the server, the client sends along the revision ID of the card before modification, which should be the revision the server is aware of. If not, the update fails, forcing the client to fetch the latest version before. This way, a client can also easily check if it has the latest version of a given card.

TheLastProject commented 2 years ago

I think I still prefer the idea of a uuid4. It seems to be exactly the kind of job it was designed to handle, and it also seems to me that it is the most common solution for this kind of problem — distributed IDs.

You raise good points in favour of UUIDs. Sadly, Android itself doesn't seem to handle those well.

Looking at https://stackoverflow.com/a/14184946/8378787, it seems like Android wants the primary key to be an integer to be able to use a Cursor. And a Cursor is the preferred way of loading data from the database in Android as far as I can tell (and used all throughout Catima, it would be an enormous refactor). So it seems like we sadly need to stick to an integer format for this. I think, however, this is not something that is required to be fixed before working on the server part. Throwing a CONFLICT for now when 2 different devices have been offline and both create a card is reasonable until the ID generation in the Android app gets improved.

A way to do this might be to carry an incremental revision id for each card. When updating the card on the server, the client sends along the revision ID of the card before modification, which should be the revision the server is aware of. If not, the update fails, forcing the client to fetch the latest version before. This way, a client can also easily check if it has the latest version of a given card.

That sounds like a simple and logical way to deal with it, yeah. I even see this work if both devices were offline if the server was just configured to only deal with one change per card at a time.

tobast commented 2 years ago

Looking at https://stackoverflow.com/a/14184946/8378787, it seems like Android wants the primary key to be an integer to be able to use a Cursor. And a Cursor is the preferred way of loading data from the database in Android as far as I can tell (and used all throughout Catima, it would be an enormous refactor). So it seems like we sadly need to stick to an integer format for this. I think, however, this is not something that is required to be fixed before working on the server part. Throwing a CONFLICT for now when 2 different devices have been offline and both create a card is reasonable until the ID generation in the Android app gets improved.

The documentation is actually extremely vague on the requirements for this _ID column, and the stackoverflow post only points out that if you store your uuid as a string (instead of a 16B blob), it will break CursorAdapter. I went through the source code to convince myself, and indeed, a field that can be used as a long is required, as seen here in the Android source code (mRowIdColumn is the id of the column _ID in the schema). I believe CursorAdapter could be inherited to override getItemId, but this starts to get seriously complicated for small benefits.

I think that the UNIX timestamp is a good way to go, then!

tobast commented 2 years ago

Where do you prefer me to start the development? I cannot fork CatimaSync because the repository is empty.

TheLastProject commented 2 years ago

Oops, I didn't notice forking an empty repo isn't possible. I've created a simple README file now so you can fork the repo :)

tobast commented 2 years ago

I'm having second thoughts on the ID being a UNIX timestamp. Sure, there is a very small chance of a user creating two cards in the same second (at least manually); but if the ID is supposed to be unique also on the sync server, it means that there will be a conflict if two users create a card in the same second. This might start to look plausible if there is a big sync server with many users at some point. Even worse: there will be no conflict on the phone directly, but there will be a conflict when trying, later, to push the card to the server.

I doubt that there will ever be a server with one card creation per second on average, so sure, this would work:

while True:
    try:
        sync.push(card)
        break
    except IdNotUnique:
        card.id += 1

but it starts to look very inelegant, and if this happens 1h after the card was created (eg. no internet), this means that the phone app cannot hold for true that a card ID never changes (I don't know if this is something expected, but it sounds like something that would be implicitly assumed).

This, plus the fact that it might be necessary to migrate all sequential IDs to globally-unique IDs at some point when upgrading the app, makes me reconsider. I actually think that the best option here to be sure not to regret later is to keep sequential IDs on the phone, and add a new UUID column on the phone that is assumed to be globally-unique. The sequential ID remains only on the phone and is never sent to the server, and the UUID (DB index) is used whenever synchronizing. As a side effect, two phones synchronized to the same account will certainly have different sequential IDs for the same card, but I guess that's not a problem.

Does this look sound to you?

TheLastProject commented 2 years ago

but if the ID is supposed to be unique also on the sync server

I don't think it needs to be. The ID is unique local on the device but I think on the server side it would be just as sensible to make the ID on the server be the account ID + card ID (so, a Composite Primary Key).

this means that the phone app cannot hold for true that a card ID never changes (I don't know if this is something expected, but it sounds like something that would be implicitly assumed).

If the card ID ever changes on the phone, it means all shortcut widgets are broken. Those could perhaps be fixable, but it would still add a lot of complexity I want to avoid. The ID should be considered unmutable on the client side.

the fact that it might be necessary to migrate all sequential IDs to globally-unique IDs at some point when upgrading the app

I don't understand why this is a "fact". I want to prevent migration ever being a thing because widgets. So it seems more likely to be a fact that the ID will never change on the device.

keep sequential IDs on the phone, and add a new UUID column on the phone that is assumed to be globally-unique. The sequential ID remains only on the phone and is never sent to the server, and the UUID (DB index) is used whenever synchronizing. As a side effect, two phones synchronized to the same account will certainly have different sequential IDs for the same card, but I guess that's not a problem.

That seems reasonable. As an added benefit, it would in the future make it easier to implement a card being shared by multiple accounts and update on all accounts it's added on.

Does this look sound to you?

It sounds okay. I suppose when a card doesn't have an UUID, it's considered new and the server will return the UUID the client should assign it locally?

There is one case I do worry about and that is users who will have been keeping their backup in sync between multiple devices manually and then start using this. But I guess your system will create duplicates which to me seems like the least problematic way to resolve such a conflict (as there's no data loss, only data duplication).

So, yeah, a new UUID column in the client and making the server talk using UUIDs seems like the cleanest possible solution. It also gives us added flexibility for the future. One thing I do find important is that we plan the format really well. Android uses SQLite, the server may use... not sure what, you named PostgreSQL before, that seems fine to me. If we're sure SQLite BLOB and PostgreSQL uuid are completely compatible... yeah, great :)

tobast commented 2 years ago

It sounds okay. I suppose when a card doesn't have an UUID, it's considered new and the server will return the UUID the client should assign it locally?

I was actually considering generating the UUID on client-side at card creation (or in the DB migration for already existing cards), but both options are looking fine to me. I think that we need to assume anyway that there will never be a global UUID collision globally: if not, it would be necessary to handle the case where a user migrates from one server to another, and suddenly there is a UUID clash. Assuming this, the server is in no better position to generate the UUID anyway.

From a pragmatic point of view, I don't believe that there is any added complexity in generating the UUID on the server, and in the extremely unlikely case where there is a clash, it is easier to handle, so yeah, sounds good.

There is one case I do worry about and that is users who will have been keeping their backup in sync between multiple devices manually and then start using this. But I guess your system will create duplicates which to me seems like the least problematic way to resolve such a conflict (as there's no data loss, only data duplication).

I see multiple ways to deal with this:

The last solution looks like the cleanest to me, but would break if a user somehow wants to strictly duplicate a card (up to its last-used date, which I don't believe is possible anyway using the app in legit ways).

So, yeah, a new UUID column in the client and making the server talk using UUIDs seems like the cleanest possible solution. It also gives us added flexibility for the future. One thing I do find important is that we plan the format really well. Android uses SQLite, the server may use... not sure what, you named PostgreSQL before, that seems fine to me. If we're sure SQLite BLOB and PostgreSQL uuid are completely compatible... yeah, great :)

I don't believe that this is actually necessary: I'd rather make sure the API calls and arguments are fool-proof, and converted on the device/server. It would be possible to transmit the binary blob corresponding to the UUID, but yeah, things could happen, endianness could be wrong at some point, we might not catch the bug, … I believe it's just safer to read the UUID blob from DB on the device and load it as a Java UUID on the phone, and send it over as a well-defined string-formatted UUID in the API (and conversely on the Python side). This way, all the storage backend is abstracted, and nothing weird should happen — and the overhead should be negligible.

TheLastProject commented 2 years ago

automatic approach: on the server, detect at card creation time that a card is an exact copy of one we already have, by the same user account. When assigning a new UUID to it, actually make it an alias and return the pre-existing UUID. The last solution looks like the cleanest to me, but would break if a user somehow wants to strictly duplicate a card (up to its last-used date, which I don't believe is possible anyway using the app in legit ways)

I think that's actually exactly what the duplicate feature does as it just copies the old card. But maybe a new duplicate should always have lastUsed set to the current time. It kinda sounds like a bug that a duplicate would have an used time equal to the card it's duplicated from. I've created #1006 for it.

I think the automatic approach would be better for most users, but it still worries me to have "magic" like this. I think we should at least give a reason in the creation endpoint if we go this route so the client can know this happened. Something like:

{"uuid": "abcdef", "reason": "alreadyExists"}

and send it over as a well-defined string-formatted UUID in the API (and conversely on the Python side)

Ah yeah, that's what I meant. I more meant that we need to be careful about the formats. But a SQLite BLOB should store any value so that should be fine (although I'm not completely sure why it'd need a blob on the SQLite Android app side instead of a string, I'm not sure how to store a blob in a .csv output file anyway but a string will work fine).

tobast commented 2 years ago

I think the automatic approach would be better for most users, but it still worries me to have "magic" like this. I think we should at least give a reason in the creation endpoint if we go this route so the client can know this happened. Something like:

{"uuid": "abcdef", "reason": "alreadyExists"}

Looks fair.

Ah yeah, that's what I meant. I more meant that we need to be careful about the formats. But a SQLite BLOB should store any value so that should be fine (although I'm not completely sure why it'd need a blob on the SQLite Android app side instead of a string, I'm not sure how to store a blob in a .csv output file anyway but a string will work fine).

I was still thinking about BLOBs as I once thought this could be an ID, but maybe you're right and a string is good enough. A BLOB is less bytes in DB (maybe faster search, then? I'd need to benchmark it to check this), but arguably it's negligible; on the other side, load to/from strings is trivial from both Java and Python built-in UUID class, although I expect BLOB to be not much harder.

m0byn commented 10 months ago

Risking the fact that this might be a more technical discussion and that this post is not as sophisticated as others I want to support this issue highly and highlight the usefulness and increased UX if it is implemented. Please implement it, if possible!

Other than that the app is amazing - keep up the good work!

Korb commented 4 months ago

How is this issue fundamentally different from “Simultaneous usage on two or more devices: thoughts” (#20)?