ericmckean / minify

Automatically exported from code.google.com/p/minify
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Invalid cache is served if source file is replaced without updating mtime #115

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Minify version: 2.1.2
PHP version: 5.2.9

What steps will reproduce the problem?
1. Unzip the attached "a.zip" and place the directory "a" in your
DOCUMENT_ROOT.
2. switch ON caching in config.php ($min_cachePath).
3. Browse the http://localhost/a/minify.html
   The new a.js will be loaded.
4. Copy the a/old/a.js(which has older modification date) a/a.js.

5. Reload the page.
   It will return 304 Not Modified. It is not checking for file content.

Expected output:
It should check for the file content hash or something and match ETag.

Actual output:
The new a.js is served instead of the old a.js. Apache works properly with
ETag on. It recognizes that a.js has been changed, and sends new ETag.

Did any unit tests FAIL? (Please do not post the full list) No.

Original issue reported on code.google.com by ruj.sa...@gmail.com on 21 May 2009 at 3:05

Attachments:

GoogleCodeExporter commented 9 years ago
Actually the a/a.js has newer modification date.
And, a/old/a.js has older modification date.
ETag should do a hash matching on the content.
But currently, minify is serving from the cache, without comparing ETag.

NOTE: minify works correctly if I place a newer modified file, so 
"If-Modified-Since"
works properly.

Original comment by ruj.sa...@gmail.com on 21 May 2009 at 3:10

GoogleCodeExporter commented 9 years ago
I appreciate the report, but I probably won't fix this.

Currently Minify's cache invalidates if the mtime(source) > mtime(cache). This 
involves only a few quick filemtime() calls (fast), requires simple logic and 
requires no metadata be stored in the cache file, allowing us to serve it very 
fast 
with readfile().

Your very reasonable request is for the cache to invalidate when 
content(source) != 
content(cache), but there are a few reasons I'm not interested in this (today):

1. It seems like a rare use case (replacing source files with those with old 
mtimes)... source control/archive exports might cause this, but if you have to 
do 
this, you (or your build process) could easily empty the cache directory, or 
touch() 
the replaced files. Perhaps this warrants documentation on the wiki.

2. It would severely hit performance and require drastic changes. At the very 
minimum, on each request PHP would need to run md5_file() on each source file 
(yikes), and at least read several bytes of metadata (either in the cache file 
or a 
separate file) to verify the hashes match.

This cost could be offset somewhat by storing an expiration timestamp for the 
cache 
file, then invalidating it when that time is reached. This approach would force 
source editors to wait to see changes in the minified files unless they 
manually 
remove cache files, but would at least save Minify from having to read the 
contents 
of source files on every request.

In short: I'd prefer to keep the current fast, simple, imperfect caching system.

Original comment by mrclay....@gmail.com on 21 May 2009 at 6:12

GoogleCodeExporter commented 9 years ago
Can't we take different an approach for calculating an ETag?
For example: combination of filemtime() and filesize()? Or, something quick 
like 
that?
This will prevent from calculating expensive md5 of the whole file.

This will be very helpful to simplify the build process.

Original comment by ruj.sa...@gmail.com on 21 May 2009 at 6:50

GoogleCodeExporter commented 9 years ago
I agree, it will not be perfect calculation for ETag.
But it will work *most* of the time (there will be rare chance of failure when 
a file 
has different content but same size and same modification date).

Original comment by ruj.sa...@gmail.com on 21 May 2009 at 6:56

GoogleCodeExporter commented 9 years ago
Hmm. We really don't need an expiration on the cache file, but rather a 
throttle for 
the validation checks. In other words, for cache validation we could do this 
(pseudocode):

Func isCacheFileValid(abc) {
  // if returns false we'd rebuild abc and abc_hash
  If is_file(abc) && is_file(abc_hash):
    If filemtime(abc_hash) over 3 seconds old:
      Check md5_file(sources) vs contents(abc_hash)
      If match:
        Touch(abc_hash). Return true.
      Else:
        Return false.
    Else:
      Return true.
  Else:
    Return false.
}

Now on busy server w/ 100s req/s, the vast majority of cache validations will 
require only 1 filemtime() and 2 is_file()s, which may even be a performance 
increase. On a development server, at most the designer would have to wait 3 
seconds 
to see a change in what Minify serves.

This would still be a big undertaking, so I wouldn't hold your breath, but I'll 
at 
least keep it on the table.

Original comment by mrclay....@gmail.com on 21 May 2009 at 8:59