airyhq / airy

💬 Open Source App Framework to build streaming apps with real-time data - 💎 Build real-time data pipelines and make real-time data universally accessible - 🤖 Join historical and real-time data in the stream to create smarter ML and AI applications. - ⚡ Standardize complex data ingestion and stream data to apps with pre-built connectors
https://airy.co/docs/core
Apache License 2.0
366 stars 46 forks source link

Cloud Storage Uploads for Contact Avatar Images & Message Attachments #483

Open steffh opened 3 years ago

steffh commented 3 years ago

## User Story As an Airy developer trying to be able to keep access to image and media files with expiring urls, I want to be able to specify the cloud storage provider of my choice (e.g. AWS S3, Google Cloud Storage, Azure Storage, local folder on my server) so that the Airy instance can upload the relevant files I receive on my Airy instance asynchronously to the configured cloud storage provider and replace the relevant urls, so that I will stay in control of these files and keep access to them


## Acceptance Criteria

## Acceptance Criteria

## Dev Notes

Sources that currently support contact / user profile pictures by way of ingestion of new messages: facebook, google

Suggested cloud storage providers to support initially for this feature:

AWS S3, Google Cloud Storage, Azure Storage, local folder on server, none (=keep expiring urls)

Please feel free to choose a sub selection of these cloud storage providers for the initial version of this feature, depending on the complexity of the integration.

## Definition of Done

chrismatix commented 3 years ago

@steffh While reviewing and updating our current solution I found this bit in the Facebook Instagram messenger documentation.

In addition, you must comply all other technical requirements set forth in the technical documentation when using the Instagram Messaging API. In particular, Instagram Messaging API leverages CDN URLs which allow you to retrieve rich media content shared by users. The CDN URL returned via webhooks, and the Conversation API, is privacy-aware. This means, the CDN URL will not return the media when the content is deleted or expired. You must not download, retain, or otherwise store on your system the media content sent or made accessible by any user via the API (or enable any third party to do so) and you, or any third party, must not do anything to circumvent expiration and/or removal of any link to such media content, without our prior permission. Instead, if your app requires continued access to the media made available via the IG Messaging API, you must only store the privacy aware CDN URL in your system and use that to render the media made accessible via the API.

It seems that we have been doing this wrong sofar and I was wondering if for compliance reasons we should review this ticket. What do you think?

AitorAlgorta commented 3 years ago

Just to publicly answer this: The CDN problem is just for IG, so we shouldn't store anything from them, but we can store stuff from other sources. So if our solutions is source specific, and we don't store IG data, we are fine.

chrismatix commented 2 years ago

Revisiting this, since it has been quite some time. @steffh we already have this functionality, but only for AWS. Do you consider this ticket done if we also implement "Google Cloud Storage, Azure Storage, local folder on my server"?

The actual data manipulation is source-specific (and already implemented) as we discussed way back then.

steffh commented 2 years ago

Ancient ticket here.

Yes, the idea was to offer other cloud storage providers next to AWS as well.

Whether it is more efficient to have one media-resolver component with configuration options for AWS S3, Google Cloud Storage, Azure Storage, local folder, etc. or rather different connectors like a AWS S3 Connector that are fully independent from each other, you would have to decide.