digidem / comapeo-core

A local-first library for collaborating on mapping projects
MIT License
8 stars 1 forks source link

Question: Where to store attachment metadata? #903

Open gmaclennan opened 1 month ago

gmaclennan commented 1 month ago

We currently have two places where we can store information about an attachment: In an attachment record, which is part of an observation, or using the arbitrary JSON file metadata supported by hyperdrive.

Currently in the attachment record we store:

Previously we stored just the mimeType in hyperdrive metadata, but now we are storing some photo metadata in there too.

I think there is a difference between "an attachment" and "a file/blob". We generate multiple versions of some attachments (e.g. photos), so there is more than one file per attachment.

It feels like the "correct" thing to do is put information about the file in the hyperdrive metadata, and information about the attachment in the attachment record, although I'm not sure there is a clear logical distinction between these two.

The advantage of attachment records is that they are stored with protobuf and we have some guarantees about the structure / type of the data. The metadata from hyperdrive could be arbitrary JSON, so we kind of need to treat it as unknown and validate it to get what we want.

Another advantage of attachment records is that the information is available with the observation, it does not require additional requests.

Keeping information in metadata also requires a separate approach for accessing a history of the information in there and validating signatures.

For me it feels like most additional information should be in the attachment record, although I don't feel able to make a strong argument for that. Plus we are currently putting additional metadata into the hyperdrive metadata records... so...

I would welcome feedback and opinions on this! I think it's early enough that we could move what we currently have in hyperdrive metadata into attachments and create a basic fallback.

EvanHahn commented 1 month ago

I agree.

Advantages of putting metadata on the attachments property of observations:

Disadvantages:

I would personally opt to put all blob metadata onto attachments, even the MIME type, because (1) it's a bit simpler to have all the data in one place (2) you could infer the attachment type from the MIME type. But I don't feel strongly about this detail.

I'm not sure we have time to implement this, but if we decide it's a priority, I think we should put metadata on attachments.

In whatever case, I think #901 is a step in the right direction there.

gmaclennan commented 1 month ago

I would personally opt to put all blob metadata onto attachments, even the MIME type

As discussed, MIME type is better in hyperdrive metadata, not the attachment, because different variants could have different mimetypes, e.g. an audio file preview could be in a more compressed format like .ogg or .3gp, and a thumbnail could be a waveform image. Anything that could differ by variant should be in the hyperdrive metadata, since it's per blob, and there's a one-to-many relationship between an attachment and blobs.