Open nickjj opened 4 years ago
Hi, thank you for the question.
Do you have an example when it would be useful and what issues do we have now with the current approach?
I think keeping the original file name is more user friendly approach than just generation some random string.
To address your question, you can define a version name function which allows you to pick the filename by yourself.
Certain browsers don't take the query string into account. Also certain proxies will ignore them when it comes to caching. In both cases that means the cache hit never happens.
Using query strings for versioning files has been openly not recommended for over 10 years. If you Google around for the topic you'll find dozens or more in depth answers on why not.
That's why Phoenix, Rails, Django and other frameworks all use md5 hashes for their cache busting strategy.
A think a reasonable work flow would be to hash the file on upload and then add the md5 hash as part of an attribute that gets saved to the database for that file so that it can be retrieved at run time. I didn't think the implementation through 100% but I don't know if there's any other way? The md5 hash would need to be stored in the database, otherwise there would be no way to know it later.
It's to my understanding we're doing a DB lookup at run time as is to look at the updated time stamp, so it shouldn't cause any change in run time performance.
Can you give a code example on how defining a version name function will work in practice?
For example, here's my current implementation for handling a user photo.
defmodule Hello.Files.File do
defmacro __using__(types) do
quote do
use Waffle.Definition
use Waffle.Ecto.Definition
def __storage, do: Waffle.Storage.Local
def default_url(_version, _scope), do: "/images/defaults/anonymous.png"
def expanded_dir(path) do
Application.fetch_env!(:waffle, :storage_dir) <> path
end
def validate({file, _}) do
Enum.member?(unquote(types), file_type(file))
end
def mime_type(file) do
case file_type(file) do
:jpg -> "image/jpg"
:jpeg -> "image/jpg"
:png -> "image/png"
end
end
defp file_type(file) do
file.file_name
|> Path.extname()
|> String.replace(".", "")
|> String.downcase()
|> String.to_atom()
end
defp hashed(id), do: Hello.Hashid.encode(id)
defp source(scope), do: scope.__meta__.source
end
end
end
defmodule Hello.Files.Profile do
use Hello.Files.File, [:jpg, :jpeg, :png]
@versions [:large, :medium, :small, :tiny]
def storage_dir(_version, {_file, scope}),
do: expanded_dir("/photos/#{source(scope)}/#{hashed(scope.id)}")
def filename(version, _), do: "photo-#{version}"
def transform(version, _scope) do
{:convert, "-strip -resize #{dimensions(version)} -format png", :png}
end
defp dimensions(:large), do: "600x600"
defp dimensions(:medium), do: "300x300"
defp dimensions(:small), do: "150x150"
defp dimensions(:tiny), do: "40x40"
end
This creates a photos/xxx/photo-tiny.png
, photos/xxx/photo-small.png
, and so on files. How could this be adjusted to include the md5 hash, and also adjust the Ecto type to support that along with writing that information to the DB on file upload and reading it out when I access the file?
Is this something still up for discussion?
If it won't be added to the library itself, what would the steps look like to implement something like the above for a specific project? I've kind of stepped away from the project I was working on for ~6 months, but I'm thinking about resuming it and this feature is kind of critical to the app. Not really sure how to move forward.
I think the most simple way to implement this is to define custom filename for uploaded files.
You can generate random string or take md5 hash of file's content.
Then you can completely omit the version attribute in url because the filename will be different on each upload.
Define filename
function inside your uploader
def filename(version, {file, scope}) do
md5_name = get_md5(file.path)
"#{md5_name}_#{version}"
end
Probably, we should modify waffle_ecto
to allow removing version from
generated urls.
Hi,
Have you given any thoughts on using an md5 hash to uniquely identify versions? This is a best practice when it comes to cache busting vs using query parameters.
I know this introduces a fair bit more complexity but it seems like a good long term solution. Phoenix Digest has all of the code related to pulling this off as well.