Make ResourceWatcher run efficiently

GoogleCodeExporter commented 9 years ago

The ResourceWatcher works asynchronously. This is the reason why it is 
impossible to guarantee that every request will fetch the latest processed 
resources unless the cache is disabled, which is inconvenient during 
development.

Scanning for resource changes each second is too often, given that 
ResourceWatcher is somehow expensive (depends on number of resources from 
model) because of the following reasons:
 - the resources are checked synchronously. If on average fetching 1 resource takes 10ms and you have 100 resources it might take up to 1 second.
 - fetching resources can be also expensive, depending on type of resources. For instance if most of your resources are external (ex: http://jquery.com/script.js), fetching this kind of resource might be slowed down due to network latency and connection speed.

This is the reason why making ResourceWatcher synchronous before the actual 
processing is done might be not a very good idea.

A better solution might be a trade-off between scheduling ResourceWatcher 
periodically and trigger it on demand when the page is requested. More 
specifically, 
using a reasonable default (ex: 3 seconds), the ResourceWatcher will not 
perform the check more often than 3 seconds, but in the same time will be 
triggered only by a page request. Thus, the following assumptions would be 
valid:
- The ResourceWatcher is not triggered unless a request is performed
- When no request is performed for a long period of time, the ResourceWatcher 
won't be invoked
- The first request taken after a long period of time will trigger the 
ResourceWatcher, but will fetch the stale resources from cache (since 
ResourceWatcher is asynchronous). However the next request will bring the 
latest changes (given that the ResourceWatcher managed to complete its job 
before that request).
- When the page is requested in a "hammer mode", the ResourceWatcher will be 
invoked only once after each 3 seconds.

This is a somehow detailed description of the trade-off solution, which might 
be a good starting point for improving the way ResourceWatcher is implemented. 
I'll create a new issue (or a thread on the mailing list) where I would like to 
hear your opinion.

Original issue reported on code.google.com by alex.obj...@gmail.com on 11 Aug 2012 at 8:01

GoogleCodeExporter commented 9 years ago

Thanks Alex for explaining it in more details.

Please consider my case:

- I don't depend on external resources (and I think it is not quite acceptable 
to depend on external resources in wro4j runtime build mode).
- I do have about 15 files which are checked (maybe not optimally because 
ResourceWatcher ends up checking 25 files due to issue #521).

This is not heavy weight (css/js) project I think but still wro4j gives me some 
request time improvement (~25-50%) and the main thing it gives better 
manageability of resources in development:
- grouping js/css files.
- referencing only one (or few) file (group) instead of tens.
- 'console' removal and other code optimizations

So currently the whole ResourceWatcher run with 15 files on MacBook Air 2011 
(i5/ssd/4gb) it takes only about 50 milliseconds. So even if we have 10x more 
files I think the whole run will still fit into < 0.5-1 sec interval (and 
considering future improvements e.g. issue #511 it could be even less).

So I think for such not that unusual case it would probably makes sense:
instead of 
- being forced to run watcher every second with the only reason just to always 
(for sure) get processed/not cached resource on request (developer refreshes 
page to see/check his changes) which continuously takes some cpu time (not much 
as I observed, maybe 1%) even when developer is not refreshing page or even not 
doing any js/css development
+ make an option to run watcher synchronously on the first request to the first 
resource, making other resource requests to wait for this watcher thread to 
complete it's job (developer refreshed a page and is waiting for all the 
resources to be fetched) and most other times the watcher could be idle - 
provided that we keep it running only to have model (wro.xml) parsed in memory 
for optimal access.

Maybe in this case it could even also be improved to check state of only 
currently fetched resources (taking already parsed model) instead of walking 
through whole model (all files).

It could be the most confusing for developer to see stale resources even if you 
refreshed the page, so that developer will need to hit refresh second time to 
really fetch latest changes.

Minimizing the watcher period to =1 second seems do the job considering that 
developer switches between editor and browser and hits refresh > 1 sec. So this 
improvement is not that critical having such workaround. It is just unusual to 
know that something is always scanning my css/js files bringing some noise to 
the idleness of developer thinking time :)

Original comment by lystoc...@gmail.com on 12 Aug 2012 at 7:58

GoogleCodeExporter commented 9 years ago

- using external resources in runtime can be an architectural decision and 
could make sense in some situations (ex: when you are depending on 3rd party 
which evolves independently or when resources are not bundled with the 
application itself). Therefore it is considered an acceptable use-case.
- Having 15 resources is a common use-case, but I cannot generalize this 
pattern. There  are different type of users. Some have very small resources, 
other have hundreds or even thousands. This can change in time when application 
evolves and become more complex. It is hard to find the approach which would be 
the best for all situations. That is why I would like to hear community 
thoughts, in order to find the most acceptable approach. I'm also considering 
making this feature configurable...

Checking the state of only currently fetched resources instead of entire model 
is a good idea. 

Checking if resources are changed for each request synchronously ensure that 
the fetched resource is not stale, but introduce an overhead which is 
acceptable in your case, but might become unacceptable in other situations. 
This is the reason why I'm slightly in favor of keeping the ResourceWatcher 
asynchronous, the downside being that sometimes the fetched result is stale.

Original comment by alex.obj...@gmail.com on 12 Aug 2012 at 9:15

GoogleCodeExporter commented 9 years ago

Thanks for considering the suggestions. Making it configurable option would 
probably satisfy all types of users.

> the downside being that sometimes the fetched result is stale

Sometimes? Isn't it always stale for the first request?

Original comment by lystoc...@gmail.com on 12 Aug 2012 at 9:43

GoogleCodeExporter commented 9 years ago

You're right. The first request after the resource was changed will bring a 
stale result, but any subsequent requests will fetch the fresh result.

Original comment by alex.obj...@gmail.com on 13 Aug 2012 at 7:22

GoogleCodeExporter commented 9 years ago

Fixed in branch 1.4.x. 

Now the resourceWatcher will run lazily on request. Each request after the 
period set by resourceWatcherUpdatePeriod config value, will guarantee to get 
the latest version of resources.

Original comment by alex.obj...@gmail.com on 20 Aug 2012 at 1:04

Changed state: Fixed
Added labels: Milestone-Release-1.4.9

GoogleCodeExporter commented 9 years ago

Sorry but I don't quite understand what has changed? Does you mean that 
resourceWatcher will not run periodically anymore?

Original comment by lystoc...@gmail.com on 20 Aug 2012 at 7:05

GoogleCodeExporter commented 9 years ago

A thread will run periodically which will reset a flag used by ResourceWatcher 
to decide if the check for changes is required or not. The actual check will be 
performed by the first request invoked after X seconds (configured by 
resourceWatcherUpdatePeriod) elapsed.

You can checkout the 1.4.x branch and test it. Appreciate any early feedback.

Original comment by alex.obj...@gmail.com on 20 Aug 2012 at 7:27

changcheng / wro4j

Make ResourceWatcher run efficiently #514