gatsby-uc / gatsby-plugin-s3

Deploy your gatsby site to a S3 bucket.
https://gatsby-plugin-s3.jari.io/
MIT License
210 stars 110 forks source link

Hashing each file means large sites sync very slowly #459

Open FraserThompson opened 1 year ago

FraserThompson commented 1 year ago

My site is very large and contains some big files (like over 100mb). Because gatsby-plugin-s3 uses an MD5 hash of each file to determine if it's changed, this can mean very slow syncs (because hashing files is slow).

I found a similar issue in gatsby-source-filesystem which resulted in the addition of a a "fast" option which uses a slightly less robust but much faster method instead of hashing. So I'm just checking if a similar feature would be appreciated here?

I've done some experimenting and I think we can compare the size and the mtime between the local filesystem and the S3 metadata (which is how this s3 sync library does it). Less robust than hashing, but probably okay for 99% of use cases.

(As a bonus this would also resolve this issue with large files always being re-uploaded because the etag for multi-part uploads is handled differently: https://github.com/gatsby-uc/gatsby-plugin-s3/issues/59)

If this sounds like something the community would want I can throw a pull request together.