google-code-export / wro4j

Automatically exported from code.google.com/p/wro4j
1 stars 1 forks source link

ResourceChangeDetector uses contents hash to check for file changes; using last modified date would be more efficient #820

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
ResourceChangeDetector uses the hash of a resource to determine if the resource 
has changed (a changed hash means the file has changed). The implication is 
that every resource must be read completely, which is inefficient. A more 
efficient approach would be to just see if the file modification date has 
changed.

Original issue reported on code.google.com by candrews...@gmail.com on 2 Dec 2013 at 7:09

GoogleCodeExporter commented 9 years ago
I have 3.5MB of CSS and JS that WRO processes in my project (yes, I have a 
bloat problem) which takes the server a non-negligible amount of time to hash 
every time resourceWatcherUpdatePeriod expires.

I looked at ResourceChangeDetector.checkChangeForGroup, thinking I could just 
change it... but locatorFactory.locate(uri) returns an InputStream, and not a 
File, so there's no way to get the modified date.

Original comment by candrews...@gmail.com on 2 Dec 2013 at 7:12

GoogleCodeExporter commented 9 years ago
Don't forget that the resource is not necessarily a file, it can be anything: a 
dynamically generated stream, an external link or a file located in the 
classpath (which is also a file).

Another aspect worth mentioning is that a changed file, doesn't necessarily 
mean that its content is different (example: open a file, add a character and 
remove it, then save the file). As result, you potentially force an expensive 
computation of an unchanged resource. 

The assumption is that it is much more expensive to process a resource, then to 
compute its hash. As result, the current approach might be considered an 
acceptable trade-off. 

Have you measured the time taken by the hash computation of 3.5MB of resources 
on your computer? You could try replace the default hashStrategy with something 
faster. Probably we could come up with some improvement, but we need to measure 
first. 

Original comment by alex.obj...@gmail.com on 3 Dec 2013 at 8:08

GoogleCodeExporter commented 9 years ago
That makes sense... with that in mind, could the hashing be done in another 
thread?

Right now, the first request that comes in after resourceWatcherUpdatePeriod 
expires does the hashing check. In my case, it takes my server ~700ms to do 
that. If there was another thread that did the hashing on a timer, that would 
stop wro4j from blocking the web requests, and solve the problem.

Do you think that could work?

Original comment by candrews...@gmail.com on 3 Dec 2013 at 7:16

GoogleCodeExporter commented 9 years ago
As far as I remember, the check for change is performed asynchronously
using a scheduler, so it should not block the request processing. I will do
some tests to be sure.

Original comment by alex.obj...@gmail.com on 3 Dec 2013 at 9:13

GoogleCodeExporter commented 9 years ago
Actually you are right, the hash checking is not executed in a separate
thread and is blocking the main processing. I have created an
issue<https://code.google.com/p/wro4j/issues/detail?id=822> which
will be addressed for the next release.

Original comment by alex.obj...@gmail.com on 3 Dec 2013 at 10:33

GoogleCodeExporter commented 9 years ago
Thank you for creating that issue!

This is probably more work and perhaps over-optimization once 822 is fixed... 
but could you have a resource-type-specific checker that uses the file 
modification date (if the resource is a file) and uses the hash otherwise?

Original comment by candrews...@gmail.com on 3 Dec 2013 at 10:57

GoogleCodeExporter commented 9 years ago
It could be implemented, but that wouldn't be accurate, since as I
mentioned previously - a changed file doesn't necessarily mean changed
content. Also, once the checksum computation is performed asynchronously,
what would be your concern regarding current approach?

If you still consider that file modification date is important, I think it
would be possible to make ResourceWatcher extensible, so that you could
provide a custom implementation if needed.

Original comment by alex.obj...@gmail.com on 3 Dec 2013 at 11:08

GoogleCodeExporter commented 9 years ago
If #822 is resolved, then my concerns are mostly eliminated. If the file 
modification date is checked, and then the hash is checked, that would be more 
efficient than always hashing (as most of the time, the date check would 
indicating the file didn't change, so the file wouldn't have to be read and 
hashed), but that's a minor, nice to have optimization and not nearly as 
important as #822.

Original comment by candrews...@gmail.com on 12 Dec 2013 at 4:10

GoogleCodeExporter commented 9 years ago
Assuming the issue822 will be eventually fixed, do you agree to close this 
issue with WONTFIX resolution?

Original comment by alex.obj...@gmail.com on 14 Dec 2013 at 10:06

GoogleCodeExporter commented 9 years ago
I'm fine with that, and have just resolved this was WONTFIX. Thanks!

Original comment by candrews...@gmail.com on 16 Dec 2013 at 2:00