Ideas for auto-updating Mapeo with sync

gmaclennan commented 4 years ago

This is a discussion of how to distribute Mapeo updates peer-to-peer within the app, to avoid the user needing to update from the play store or download an installer from the internet.

Mobile

There are two ways of updating on mobile:

1. Updating the JavaScript bundles

Most of the code in Mapeo is in JavaScript, and is in two parts: A "client-side" bundle which runs the front-end React Native code and a "server-side" bundle which runs the nodejs-mobile code (e.g. mapeo-core). It is possible to update the JavaScript without any special permissions, and can be done seamlessly for the user. For the client-side code, Expo (which we use for several app components) has good support for OTA updates and allows you to host them on your own server i.e. we could host these on a local http server run from mapeo-core. Currently the Expo OTA module only works for "managed apps" i.e. apps which use the whole Expo toolchain, but they are about to release them for "bare" apps i.e. an independent module which can be installed for any app (e.g. mapeo).

I am not aware of any existing modules for updating the server-side bundle that runs in nodejs-mobile. However, it would not be too hard to do. At install-time, nodejs-mobile extracts the JavaScript bundle from the APK and writes it to a user folder, and runs it from there. It would not be hard to replace those files on disk with updated code, but it would require writing a Native (Kotlin/Java) module to do that before nodejs-mobile starts, with an option to rollback if it causes a crash.

TL;DR it should be fairly easy to update the front-end JS bundle once the Expo OTA module is ready, as long as we have a way to serve an updated bundle over a local http server. This would work on both Android and iOS (although OTA updates of nodejs-mobile code that includes Node Native modules may not be permitted by the Apple App Store).

2. Updating Native modules

In the context of mobile, "Native" means Java/Kotlin code, not Native (C++) Node modules, which can be updated along with the JavaScript bundle as described above. We use Native code for everywhere that Mapeo interacts with the device APIs, e.g. GPS, File Storage, Camera, Nodejs-Mobile. Updating any of these modules would require updating the whole app with a new APK.

Fortunately, on Android (not iOS) the app is able to launch an APK to update itself. E.g. if we have a way of getting a new APK into local storage, the app can launch it and the user will be prompted to update the app. F-Droid updates itself in this way, and it doesn't require much code, just requesting the correct permissions from the user. This is a bit more of a dangerous process than updating the JS bundle, since there is no way to downgrade back to a previous version other than uninstalling and re-installing.

TL;DR If we can get an updated APK onto the disk, Mapeo can update itself from it with a few lines of Java/Kotlin code.

Desktop

We use Electron Builder which supports auto updates which can be downloaded from a user-configured URL (e.g. a local http server) or, potentially, be read directly from disk (the API allows the update to happen in steps, e.g. download to disk first, then update, so it should be possible to directly run the update from a file already downloaded, and remove the need to download from a local http server).

TL;DR if we can get the new installers locally, we should be able to update from them without too much difficultly.

How to distribute updates?

It's important that users can trust that updates come from Mapeo developers (e.g. Digital Democracy) and do not include viruses or security threats. For APK / Desktop installs we sign the installers and on mobile it is not possible to update with an APK signed by anybody else. The OTA updates don't have the same protection.

Hyperdrive probably solves this problem for us. It could be signed with a Dd key-pair, and we could build that into the app. It's probably useful to have both mobile and desktop installers on desktop, but because of space limitations on mobile it's probably best to only have mobile installers on mobile.

Mapeo devices could all have a join a common topic with a discovery key based on the hyperdrive public key, and replicate to get the latest versions, using path prefixes to only replicate what's needed e.g. /mobile/ or /desktop/. We could include both JS bundles (more frequent updates, but smoother upgrade process and they take up much less space <10Mb vs. 80Mb for APK) and APKs. For desktop it would be quite large because we would want to distribute Linux, Mac and Windows installers, which total about 350Mb currently but we could optimize that and get it down to about 60Mb per installer.

Qn: Is it easy in Hyperdrive to "delete" data from the local cache, e.g. each device would only need the most recent versions of the installers, so they would want to free up disk space by removing all previous versions once they get a new one.

Once the updated APK / JS bundle / Desktop installer is on the disk, we can use the update mechanisms described above to update.

Future proof?

If we locked the update discovery / sync to a particular version of hyperdrive we could just keep using that. The rest of mapeo discovery could update to newer protocols, as long as errors are ignored and the app can get updates and update itself.

okdistribute commented 4 years ago

Qn: Is it easy in Hyperdrive to "delete" data from the local cache, e.g. each device would only need the most recent versions of the installers, so they would want to free up disk space by removing all previous versions once they get a new one.

Yes, Hyperdrive has unlink command which removes the file, but also it is pretty dumb about downloads. If you delete it from disk it doesn't notice, because it has been marked downloaded in the metadata. If you want to clear the metadata, you can use the metadata.clear(start, end, opts) function (which comes from hypercore, see those docs for more detail)

gmaclennan commented 4 years ago

@karissa checked on this today, only the archive creator could do unlink - what is missing is a hyperdrive version of hypercore.clear() or "undownload"

okdistribute commented 4 years ago

@gmaclennan I think we can do drive.metadata.clear() and drive.content.clear(), or maybe I can open a PR to add that, should be pretty straightforwrd

okdistribute commented 4 years ago

@gmaclennan actually with the new corestore module this has become a bit different in the latest hyperdrive so content.clear() won't work; however, it would be worth seeing if you delete files on the filesystem if hyperdrive tries to refetch them or not

gmaclennan commented 4 years ago

@karissa I looked into this a bit more and did some local testing with the latest hyperdrive + corestore, and latest hypercore. All files in hyperdrive are still stored in a single hypercore, and using random-access-file they are all appended in a single data file. If you sync in sparse mode, only the files you request are stored in this file, the rest is 0s, and if the OS has sparse support then the file only takes up the disk space of the actual files. However if you already have the files downloaded you need a way to "zero-out" chunks fo the data file.

In hyperdrive there is not currently a way to call hypercore.clear() for the underlying data store, and it could be added in a PR. However, hypercore.clear() calls raf.del() which is currently a no-op if you clear a chunk other than the last. E.g. if the latest version of a file is appended to the end of the storage feed, then clear() does not actually remove anything from earlier in the underlying data file, it only changes the bitmask and marks it the chunk not available.

One option is to update raf so that del() actually writes zeros to a file. That should be fairly straight forward, although we need to test it on different devices to see if it actually frees up disk space. On my Mac (MacOS has very poor support for sparse files, and only supports them on APFS which was released last year) writing zeros to a sparse file does not free up disk space.

The other option is to write a custom random-access-file implementation that rather than writing to a single file, writes to multiple files. hypercore very rarely (never?) seems to call random-access-storage for ranges other than chunk ranges, so a custom raf could create a file for each range, with the filename being the offset. If hypercore did request a read with an offset that does not align to a chunk, it would require a slightly slower op to readdir() and then do a partial read of the relevant chunk. This way clear() (which calls raf.del()) could actually remove files and free-up space. This would still need a clear() method exposed on hyperdrive though.

okdistribute commented 4 years ago

Ah, we might want this -- although this was written for hyperdrive 9 and i'm not sure how compatible it will be with 10: https://github.com/datproject/dat-storage

This storage module copies files from the internal content hypercore to the filesystem as files. IIRC, when you delete the files from the filesystem (the representations from this module) using rm, hyperdrive will have marked it 'downloaded' and won't send another want for it later in another sync.

gmaclennan commented 4 years ago

Ah! That's really clever :) It would be interesting to test with new hyperdrive. Hyperdrive does actually call raf.del(range) when you call clear, depending on how this is hooked up it might delete files based on that.

I've been testing any thinking about replication and I think it may be better for us to:

Create archive with sparse: true
Replicate with live: true (no data downloaded)
Call download('/file_i_want)

It will only download the most recent version. You could then delete previous versions (if hyperdrive exposes contentFeed.clear().

okdistribute commented 4 years ago

Looks like @noffle has a plan for this, going to track in #92

hackergrrl commented 4 years ago

I think this path would still be good to pursue & research out. The implementation I'm thinking of is pretty rudimentary and lacks versioning and many of the crypto checks hyperdrive provides. It should be possible to introduce a new hyperdrive-based system in the future without breaking compatibility.

hackergrrl commented 4 years ago

Summary of tasks & blockers if we wanted to use hyperdrive:

NTFS sparse support needed: https://github.com/random-access-storage/random-access-file/issues/23
macOS HFS+ does not support sparse files. (new Apple FS does)
Check if Linux sparse files work (on init + when cleared)
Check if Android sparse files work (on init + when cleared)
- Also, would we need to check across all major phones, or can we rely on this being available Android-wide?
Modify random-access-file to write zeros on del()
Add a clear() API to hyperdrive
Implement some kind of chunking mechanism into random-access-file to split entries into physical files so they can be safely deleted & their space reclaimed
- This holds the risk of breaking us if hyper{core,drive} decides to change the offsets/alignment it reads/writes on if we make assumptions about that

hackergrrl commented 4 years ago

We could also write our own little signing/verification mechanism. Here are some tasks & blockers for that:

Download remote files into staging/ directory
Verify hash and signature against known key once downloaded
If ok, move to 'upgrades/' directory for sharing with others
Clearing an old upgrade binary deletes the file
- Open Q: how would the metadata for each file be stored & synced?

gmaclennan commented 4 years ago

I was thinking about this again today.

Summary of tasks & blockers if we wanted to use hyperdrive:

NTFS sparse support needed: random-access-storage/random-access-file#23

macOS HFS+ does not support sparse files. (new Apple FS does)

I think not having MacOS HFS+ support is an ok compromise, as long as Android, Windows and Linux work.

Check if Linux sparse files work (on init + when cleared)

Check if Android sparse files work (on init + when cleared)

Also, would we need to check across all major phones, or can we rely on this being available Android-wide?

Modify random-access-file to write zeros on del()

Add a clear() API to hyperdrive

Implement some kind of chunking mechanism into random-access-file to split entries into physical files so they can be safely deleted & their space reclaimed

This holds the risk of breaking us if hyper{core,drive} decides to change the offsets/alignment it reads/writes on if we make assumptions about that

I think (7) could be implemented in a way that remains compatible with hyper{core,drive} by writing fixed chunks to files in the same way that random-access-idb works. E.g. rather than writing to a single file, write to file chunks of a fixed size, e.g. 1Mb. A .del(offset, length) operation could delete all chunks that are within offset -> offset + length. This would not be efficient at recovering data with lots of small files, but with a small number of large files it would only leave overlapping chunks on the disk.

digidem / mapeo-core