doomspork / artifact

File upload and on-the-fly processing for Elixir
Apache License 2.0
44 stars 2 forks source link

Use riak core and send_file #11

Open aphillipo opened 7 years ago

aphillipo commented 7 years ago

I'm still looking at this, and was trying to change this to use send_file rather than send_resp.

send_file will be a lot faster than send_resp and use less memory etc.

Doesn't seem Artifact is written with doing this in mind though and passes everything round as strings.

Be really great if generated images could be distributed using riak_core with cross data centre possible in the design.

doomspork commented 7 years ago

Howdy @aphillipo! Thank you for creating this issue. I'd love to help you with this to continue improving Artifact.

My personal life and work are be settling down so I'll be resuming my work on this project and some others.

aphillipo commented 7 years ago

Great! Should we try abstracting away Paths/sending files with a behaviour or does it become too abstract at that point?

I'm thinking about this to support the following:

1) /some/file/path.jpg (send() -> use send_file())

2) http://some/source/file/over-http.jpg (send() -> http_client_read() streams into send_resp()?)

3) some key in haystack/seaweedfs/hdfs [https://github.com/chrislusf/seaweedfs] (send() -> streams data into send_resp(), or maybe directly from seaweedfs - might have the equivalent of sendfile...)

4) Our own riak_core ring?

This is more complex so maybe it works as follows:

a) Store files on the file system of various servers.

b) lookup by riak_id in the ring where the file is (we want to avoid rebalancing the actual file data so maybe don't store in riak)

c) riak-id -> (riak_image_server, path) -> redirect to http://img_server/path.jpg or stream via send_file from that specific server?

d) This is much more complex and probably reinventing the wheel...


5) CephFS/Rados or S3 etc. etc.

I think that's it for the output layer. I'm fine with the current "Storage" behaviour but I'd probably prefer these two behaviours were called InputStorage and OutputStorage.

I'd like to think of OutputStorage as a kind of cached output and we should be able to affect the rules of that caching away from the mechanism of what it is and how too send it to people's browsers.

What do you think? And sorry about the total brain dump!

doomspork commented 7 years ago

This is awesome @aphillipo! I like the idea of abstracting the inputSource, that was something I wanted to incorporate so I could point to S3 files for some applications.

How do you want to go about tackling this? Are you on Elixir Slack?