elixir-waffle / waffle_ecto

Waffle.Ecto provides an integration with Waffle and Ecto
https://hexdocs.pm/waffle_ecto
112 stars 30 forks source link

Use md5 hash in file name for cache busting instead of the timestamp query parameter #12

Open nickjj opened 4 years ago

nickjj commented 4 years ago

Hi,

Have you given any thoughts on using an md5 hash to uniquely identify versions? This is a best practice when it comes to cache busting vs using query parameters.

I know this introduces a fair bit more complexity but it seems like a good long term solution. Phoenix Digest has all of the code related to pulling this off as well.

achempion commented 4 years ago

Hi, thank you for the question.

Do you have an example when it would be useful and what issues do we have now with the current approach?

I think keeping the original file name is more user friendly approach than just generation some random string.

To address your question, you can define a version name function which allows you to pick the filename by yourself.

nickjj commented 4 years ago

Certain browsers don't take the query string into account. Also certain proxies will ignore them when it comes to caching. In both cases that means the cache hit never happens.

Using query strings for versioning files has been openly not recommended for over 10 years. If you Google around for the topic you'll find dozens or more in depth answers on why not.

That's why Phoenix, Rails, Django and other frameworks all use md5 hashes for their cache busting strategy.

A think a reasonable work flow would be to hash the file on upload and then add the md5 hash as part of an attribute that gets saved to the database for that file so that it can be retrieved at run time. I didn't think the implementation through 100% but I don't know if there's any other way? The md5 hash would need to be stored in the database, otherwise there would be no way to know it later.

It's to my understanding we're doing a DB lookup at run time as is to look at the updated time stamp, so it shouldn't cause any change in run time performance.

Can you give a code example on how defining a version name function will work in practice?

For example, here's my current implementation for handling a user photo.

defmodule Hello.Files.File do
  defmacro __using__(types) do
    quote do
      use Waffle.Definition
      use Waffle.Ecto.Definition

      def __storage, do: Waffle.Storage.Local

      def default_url(_version, _scope), do: "/images/defaults/anonymous.png"

      def expanded_dir(path) do
        Application.fetch_env!(:waffle, :storage_dir) <> path
      end

      def validate({file, _}) do
        Enum.member?(unquote(types), file_type(file))
      end

      def mime_type(file) do
        case file_type(file) do
          :jpg -> "image/jpg"
          :jpeg -> "image/jpg"
          :png -> "image/png"
        end
      end

      defp file_type(file) do
        file.file_name
        |> Path.extname()
        |> String.replace(".", "")
        |> String.downcase()
        |> String.to_atom()
      end

      defp hashed(id), do: Hello.Hashid.encode(id)

      defp source(scope), do: scope.__meta__.source
    end
  end
end
defmodule Hello.Files.Profile do
  use Hello.Files.File, [:jpg, :jpeg, :png]

  @versions [:large, :medium, :small, :tiny]

  def storage_dir(_version, {_file, scope}),
    do: expanded_dir("/photos/#{source(scope)}/#{hashed(scope.id)}")

  def filename(version, _), do: "photo-#{version}"

  def transform(version, _scope) do
    {:convert, "-strip -resize #{dimensions(version)} -format png", :png}
  end

  defp dimensions(:large), do: "600x600"
  defp dimensions(:medium), do: "300x300"
  defp dimensions(:small), do: "150x150"
  defp dimensions(:tiny), do: "40x40"
end

This creates a photos/xxx/photo-tiny.png, photos/xxx/photo-small.png, and so on files. How could this be adjusted to include the md5 hash, and also adjust the Ecto type to support that along with writing that information to the DB on file upload and reading it out when I access the file?

nickjj commented 4 years ago

Is this something still up for discussion?

If it won't be added to the library itself, what would the steps look like to implement something like the above for a specific project? I've kind of stepped away from the project I was working on for ~6 months, but I'm thinking about resuming it and this feature is kind of critical to the app. Not really sure how to move forward.

achempion commented 4 years ago

I think the most simple way to implement this is to define custom filename for uploaded files.

You can generate random string or take md5 hash of file's content.

Then you can completely omit the version attribute in url because the filename will be different on each upload.

Define filename function inside your uploader

def filename(version, {file, scope}) do
  md5_name = get_md5(file.path)

  "#{md5_name}_#{version}"
end

Probably, we should modify waffle_ecto to allow removing version from generated urls.