cinely / mule-uploader

Stubborn HTML5 Amazon S3 uploader
http://mule-uploader.com/
MIT License
223 stars 43 forks source link

IPFS alternative to AWS? #66

Closed sesam closed 8 years ago

sesam commented 8 years ago

Have you considered IPFS as a backend, in addition to AWS?

I'm looking at IPFS or similar backends to enable tracking files by hash and avoiding some uploading and downloading.

My user uploads an image (sometimes with slow and spotty internet) and I create different crops&sizes. The bigger sizes + most downloaded versions should go on a paid CDN, while the other images are probably more economical to just keep on my server. I need to backup the original, only.

Competing ideas: Have tried bittorrentsync (painful, kind of works) and rsync (fails, often forcing server reboot. Yes, we should use "nice 20" or such.)

Server-side it would be clean to just save an IPFS reference (hash+maybe mimetype or such).

If the image is a duplicate of an existing image, I'd detect this and use the reference of the older & bigger image, avoiding redoing cropping and resizing and also handling the case when a user reuploads an image that was cropped&resized on our website.

gabipurcaru commented 8 years ago

I'm not too familiar with IPFS, but don't you also have to host the files you publish on IPFS? With Mule Uploader, the file hosting itself is handled by S3 (which is the whole point), not by the library. If you just need content addressing, why not do what IPFS does and take a hash of the file and upload it to s3://your-bucket/the_file_hash? This way you can handle the already-uploaded files use case.

It wouldn't be hard to patch Mule Uploader to handle this. You would need the following:

  1. First of all, restrict the uploader to small sizes since you just need images (<10mb?)
  2. Then, on file select, compute a hash of the file client-side (you can do that quite easily)
  3. Check whether the file is already uploaded. If so, then you do nothing. Otherwise, start uploading normally, to that particular S3 key

Hope this helps you.

sesam commented 8 years ago

Thanks. Just storing images on S3 based on hash rather than filename is probably good enough to cover 80-95% of the actual need, so thank you for the implementation hist -- I'll close this and keep it saved!

Just to move the discussion a step ahead (while I'm learning more about IPFS), I add the rest of my thoughts on this before closing, though it's probably over-engineering it by far. Using IPFS would contribute to its network effect, working better the more it gets used, so there's that. Internet archives are making their data accessible over IPFS, so images from popular sites are in or about to come into IPFS.

About need to host: yes, likely, until IPFS gets more used. If its a photo just snapped on a phone, then it's likely not yet on any faster IPFS host and the whole upload must come from the phone. If its a meme, or a duplicate, it just may come from some faster IPFS host, making the uploading experience on the mobile client much faster, saving battery and data usage fees.

It works like this: The receiving server gets the link (hash) from the uploader. The server then fetches the linked data from wherever it is available.

As I've learnt so far, updateable links go on IPNS (stored in a DHT). The actual data chunks go on IPFS. Any link (filename/directory structure/etc) can reuse any data chunk. It seems that images can be intelligently chinked, keeping headers (like EXIF) in one chunk, then the image in a couple of normal filed (256kB) chunks. If the EXIF is changed and image reuploaded, only the first chunk needs to be uploaded, the rest are just referenced with the same hash as before as long as the actual image binary data is unchanged.

As it seems, there may be quite some overlap between mule and ipfs.js and using IPFS sure does complicate things. Integration might mean mule adopting or reimplementing some of IPFS features (like similar chunking and hashing) which is surely more work than it's worth for a typical mule user. So maybe it's best to keep this discussion academical, for now.