bluesky-social / indigo

Go source code for Bluesky's atproto services.
https://atproto.com
Apache License 2.0
660 stars 99 forks source link

How to store blobs from another struct type? #686

Closed jghiloni closed 3 months ago

jghiloni commented 3 months ago

I have a situation where I am getting data from an API in JSON and want to convert them to a lexicon and store them in the PDS as records. One optional field will potentially hold up to 1 MiB of JSON so I want to use a blob, as accessing the JSON from the PDS directly is likely useless (the JSON in question is GeoJSON coordinates).

How do I take my GeoJSON from my API struct foo and convert it to a blob to be stored in lexiconFoo, so that I can then call createBlob and createRecord?

Thanks for looking!

bnewbold commented 3 months ago

Great question!

I think that roughly what you want to do is:

I think that should all work? You may be the first developer to put JSON in a blob. The PDS also does internal tracking of which records reference which blobs, and that might not yet work with new Lexicons. All of this definitely should work, it is designed/intended to work this way; if you run in to any bugs please do let us know so we can ensure this works.

Also, yay! Sharing location data of any type is something I have hoped folks would start doing with atproto, cool to see it happening!

jghiloni commented 3 months ago

OK, so ... this is very useful. It implies that I should have a third, interstitial struct that holds the bytes themselves, and a method to marshal "from" the API and "to" the atproto model. When pulling the record from the PDS, i'd need to have a second call to pull the blob. Does that sound right?

jghiloni commented 3 months ago

That previous statement is because there will be some period of compute between when I pull from the API and when I push to the PDS, if that wasn't clear.

bnewbold commented 3 months ago

As an analogy, blobs are designed for images. You upload and download images entirely separately from records. A record can reference an image blob, but when you do an API fetch to "get" the record, you don't get the image included, that is a separate HTTP request.

In your case, you should treat the GeoJSON as a file, separate from the API structure. When you request a record from the API, that JSON won't be embedded or included in the record JSON, it will just be a reference, and the GeoJSON gets fetched in a separate HTTP request.

This is more HTTP requests, but at least to me this makes total sense b/c the GeoJSON is large.

jghiloni commented 3 months ago

Yep, same page. Thanks!