adobe / helix-data-embed

Turn data into embed-friendly JSON arrays
Apache License 2.0
4 stars 2 forks source link

Add option to save data to s3 #373

Open tripodsan opened 3 years ago

tripodsan commented 3 years ago

@trieloff's idea to store (larger) data directly in the underlying storage would make the data processing faster, especially for larger data-sets.

suggest to:

  1. content-bus generates a presigned url for the respective destination object
  2. send request to content-proxy, including a presignedStorageUrl parameter
  3. content-proxy sends along the presignedStorageUrl to data-embed, ideally using a PUT1
  4. data-embed stores the .json directly in the storage using the presignedStorageUrl
  5. data-embeds responds with a 3072, including a location header to the location of the stored object (if possible)
  6. content-proxy returns the same

1 I'm not sure about using PUT or GET, but writing content on GET feels wrong 2 I'm not sure about the redirect response. maybe a 200 when using PUT is better.

trieloff commented 3 years ago
  1. Definitely not GET, but POST would be acceptable

I'd use the universat storage abstraction to generate the presign URLs.

How do we handle multi-cloud? Run the trigger twice, in both clouds or have it generate multiple URLs?

tripodsan commented 3 years ago

How do we handle multi-cloud? Run the trigger twice, in both clouds or have it generate multiple URLs?

for helix3, the content-bus needs to store the json into the s3/cs when requested (e.g. on preview). since the content-bus is the intitiator, the corresponding content-bus will invoke content-proxy/data-embed on the corresponding cloud which will store it in the corresponding storage.

so who ever trigger content-bus, needs to trigger it in all clouds...probably helix-admin or admin.hlx3.page