digidem / comapeo-core

A local-first library for collaborating on mapping projects
MIT License
8 stars 1 forks source link

Send "want hints" to peers based on sync download setting #686

Open EvanHahn opened 4 months ago

EvanHahn commented 4 months ago

In order to know whether a connected peer has finished syncing, a device needs to know what the peer wants to download. Currently we assume peers want everything, e.g. sync is not considered complete until a connected peer has everything. However with selective sync of media, a connected peer might consider sync to be "complete" from its side when it has only downloaded some of the media.

Sending bitfields as want-hints is one option, but it has quite a high overhead, and it requires the connected peer to sync blobIndex first (so that it can calculate what it wants). It also requires the connected peer to be able to read the blobIndex, which is incompatible with a "blind" server (e.g. encrypted blobIndex). Since a "blind" server would not be able to do selective sync anyway, not being able to send a want hint as a blind server is ok.

The other option is to send "path filters" for the hyperdrives, e.g. patterns for matching paths that determine what to sync. Because attachment types and variants have a defined file layout, this should be all that is needed.

A device receiving a want hint can read its own hyperdrive, and determine what blocks the connected peer wants.

gmaclennan commented 20 hours ago

I wrote an initial version of a "want hint" protobuf message as:

message WantExtension {
  // Not using _unspecified for default enum, because this enum is always used
  // as a repeated field, so we don't need to handle the case where it is
  // not specified.
  enum BlobVariant {
    original = 0;
    thumbnail = 1;
    preview = 2;
  }
  repeated BlobVariant photo = 1;
  repeated BlobVariant audio = 2;
  repeated BlobVariant video = 3;
}

However, this doesn't have great forwards compatibility. A want hint is meant to indicate "this is what I intend to download from you". A peer on a newer version of CoMapeo might download media types and media variants that are unknown to the older peer, which would result in the older peer thinking sync was complete before it actually is complete.

The alternative I can think of is to pass a path filter as the want hint, that matches hyperdrive paths. This ensures that a newer client can always communicate to an older client what it is going to download.

I'm not sure what the best way to define a path filter is:

  1. A single glob pattern, using something like minimatch or micromatch, e.g. pathFilter: '/(photo|video)/(original|preview)/**'
  2. A list of folder names, e.g. folders: ['/photo/original', '/video/original', '/photo/preview'], then const matched = !!folders.find(p => entryPath.startsWith(p.replace(/\/$/, '') + '/'))
  3. A list of paths with wildcards, e.g. paths: ['/photo/*', 'video/original/*'], then some kind of fast & safe matcher.

None of these solutions are great, because I had kind of intended blob paths to be an implementation detail, and deal with blob IDs outside the BlobStore, but this only leaks this implementation detail into extension messages, and keeps it hidden from the "public" API.

gmaclennan commented 11 hours ago

Following up with what we discussed in a huddle:

message DownloadIntentExtension {
  message DownloadIntent {
    repeated string variants = 1;
  }
  map<string, DownloadIntent> downloadIntents = 1;
}

The goal is for something that is structured (map of blob type to blob variants), but can handle future new blob types and new variants, and potentially have other criteria added to downloadIntent, e.g. size, or some other metadata.