changcheng / wro4j

Automatically exported from code.google.com/p/wro4j
0 stars 0 forks source link

wro4j maven plugin should support incremental build #459

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The purpose of this issue is to add a feature which would allow improve the 
performance of m2e-wro4j maven plugin. 
https://github.com/jbosstools/m2e-wro4j/issues/1

The idea is to make wro4j maven plugin aware about resources being changed 
(provided by BuildContext API) and trigger the processing of the groups 
containing changed resources only. 

1. The new feature should be supported as part of run goal
2. Hashes should be stored at custom location (defaulted to target/wro4j/hash 
or something - maven-war-plugin does something similar)
3. get modified files from buildcontext. (paths are relative to 
${project.basedir})
4. check if found files match defined groups from wro model
5. when no files match defined groups - nothing should happen 
6. when a resource match defined group, the group containing that resource will 
be processed

Useful links:
# 
http://old.nabble.com/-jira--(MOJO-1785)-lesscss-maven-plugin%3A-provide-m2e-lif
ecycle-mapping-td33119534.html

Original issue reported on code.google.com by alex.obj...@gmail.com on 12 Jun 2012 at 11:44

GoogleCodeExporter commented 9 years ago
Supporting hashes pesistance at this point is not that important, this can be 
implemented as part of a different user story.

Original comment by alex.obj...@gmail.com on 12 Jun 2012 at 11:48

GoogleCodeExporter commented 9 years ago
I'm using wro4j on a very large site with multiple isolated groups and this 
feature would help immensly. As of now its approaching 20-25 seconds for any 
change to my js or css files. To make matters worse I'm using a js template 
engine so this delay happens every time I need to change to html as well.

Original comment by FateF...@gmail.com on 4 Aug 2012 at 12:49

GoogleCodeExporter commented 9 years ago
Why don't you use runtime version for development? It should improve your dev 
experience until this issue is available.

Original comment by alex.obj...@gmail.com on 4 Aug 2012 at 12:59

GoogleCodeExporter commented 9 years ago
Is runtime incremental? If I have set:
  debug=true
  disableCache=true
wro rebuilds in runtime js/css files on every access even if it was not 
modified and only doesn't rebuild if cache is disabled, but then server restart 
is needed to see changes.

Original comment by oleksand...@diio.net on 4 Aug 2012 at 7:47

GoogleCodeExporter commented 9 years ago
Runtime is not incremental, but there are workarounds which can smooth the 
development:

1) Use reasonable value for cacheUpdatePeriod property (ex: 30 seconds which 
will clear the cache every 30 seconds)
2) Trigger the cache on demand using JMX or invoking the 
/wro/wroAPI/reloadCache url in development mode when you know the resources has 
been changed.

Original comment by alex.obj...@gmail.com on 5 Aug 2012 at 10:19

GoogleCodeExporter commented 9 years ago
Sorry but I don't find cacheUpdatePeriod useful in the case when I changed one 
js file and want it to be retrieved refreshed from the group it is included in 
immediately after hitting refresh instead of waiting for 30 or 10 or 3 secs. 
But other (not changes caches) should stay.

Invoking the /wro/wroAPI/reloadCache to clear all cache is better, but again it 
is manual and clears everything not only files/groups that were changed.

Original comment by oleksand...@diio.net on 6 Aug 2012 at 6:21

GoogleCodeExporter commented 9 years ago
I'm working on a issue which will detect resource changes for runtime solution. 
Hopefully it will be included in next release.

Original comment by alex.obj...@gmail.com on 6 Aug 2012 at 7:04

GoogleCodeExporter commented 9 years ago
The new 1.4.8 release provides the incremental resource change detection as a 
runtime solution.

Original comment by alex.obj...@gmail.com on 9 Aug 2012 at 10:11

GoogleCodeExporter commented 9 years ago
Thanks Alex, I'm trying 1.4.8 and it is not clear for me if I need to change 
something in configuration as with default option 
(resourceWatcherUpdatePeriod=0) and with disableCache=true, debug=true each 
time I reload page (not modifying anything) I get output like this (e.g. from 
'common' group):

15:25:28,422 DEBUG SchedulerHelper:96 - period: 0 [SECONDS]
15:25:28,423 DEBUG SchedulerHelper:96 - period: 0 [SECONDS]
15:25:28,431 DEBUG AbstractSynchronizedCacheStrategyDecorator:36 - Searching 
cache key: java.lang.String@7ead510[common,JS,true]
15:25:28,431 DEBUG AbstractSynchronizedCacheStrategyDecorator:36 - Searching 
cache key: java.lang.String@7ead510[common,CSS,true]
15:25:28,435 DEBUG AbstractSynchronizedCacheStrategyDecorator:51 - Cache is 
empty. Loading new value...
15:25:28,436 DEBUG AbstractSynchronizedCacheStrategyDecorator:51 - Cache is 
empty. Loading new value...
15:25:28,436 DEBUG DefaultSynchronizedCacheStrategyDecorator:40 - load value in 
cache for key: java.lang.String@7ead510[common,JS,true]
15:25:28,436 DEBUG DefaultSynchronizedCacheStrategyDecorator:40 - load value in 
cache for key: java.lang.String@7ead510[common,CSS,true]
15:25:28,437 DEBUG GroupsProcessor:67 - Starting processing group [common] of 
type [JS] with minimized flag: true
15:25:28,437 DEBUG GroupsProcessor:67 - Starting processing group [common] of 
type [CSS] with minimized flag: true

Original comment by lystoc...@gmail.com on 10 Aug 2012 at 12:34

GoogleCodeExporter commented 9 years ago
I'll create a wiki page describing in more details this feature. 

What you have to do is to set resourceWatcherUpdatePeriod to something greater 
than 0 (example: resourceWatcherUpdatePeriod=5 to make it run each 5 seconds).

Also do not disable cache, otherwise you cannot benefit of this feature. Set 
the disableCache=false.

Original comment by alex.obj...@gmail.com on 10 Aug 2012 at 12:40

GoogleCodeExporter commented 9 years ago
I checked javadoc for resourceWatcherUpdatePeriod, it didn't write what =0 
means so I think it is first place where it should be documented.
But my understanding was that it should be active in when cache is disabled 
(disableCache=true), because with enabled cache it works perfectly anyway.
Or do you mean that now even with enabled cache but with 
resourceWatcherUpdatePeriod=1
the changes are detected and reloaded?

Original comment by lystoc...@gmail.com on 10 Aug 2012 at 1:11

GoogleCodeExporter commented 9 years ago
I at least don't see changes to css are applied when cache is enabled and 
resourceWatcherUpdatePeriod=1

Original comment by lystoc...@gmail.com on 10 Aug 2012 at 1:14

GoogleCodeExporter commented 9 years ago
... but they are applied only after restart.

Original comment by lystoc...@gmail.com on 10 Aug 2012 at 1:15

GoogleCodeExporter commented 9 years ago
The resourceWatcher scheduler is started after the first request.

The disableCache is totally independent of resourceWatcherUpdatePeriod. They 
mean different things:
 - disableCache - is a flag which enables or disables the cache. When this is true, each request will trigger entire processing flow (slow). Otherwise the cached value is served (fast)
 - resourceWatcherUpdatePeriod - period in seconds to run the scheduler which compares the fingerprints of each resource. When it detects the change, the group from cache containing that particular resource is invalidated. Thus, subsequent request for changed group only will require processing. 

Original comment by alex.obj...@gmail.com on 10 Aug 2012 at 1:34

GoogleCodeExporter commented 9 years ago
okay, thanks, now I got it.
So for development instead of
disableCache=true
we can use now:
resourceWatcherUpdatePeriod=1
which will be more optimal as it doesn't rebuild all the requested groups but 
those which changed.

So disableCache could be deprecated now (and disableCache=false could be 
default), otherwise I don't see use for it.

One thing improvement that I thought will be made is that resourceWatcher will 
do his work on request, not on interval basis. Because if developer doesn't 
request anything why should the thread be running and consuming resources?

Original comment by lystoc...@gmail.com on 10 Aug 2012 at 3:42

GoogleCodeExporter commented 9 years ago
Indeed, deprecating disableCache property does make sense. 
I would recommend to set resourceWatcherUpdatePeriod to something greater than 
1 (example 10)... it doesn't worth checking too often for changes. 

Checking for changes on each new request instead of an interval is also a good 
idea. Probably a combination between those two approaches might work. I'll 
create an issue for that, just to not forget about this idea.

Original comment by alex.obj...@gmail.com on 10 Aug 2012 at 9:10

GoogleCodeExporter commented 9 years ago
Well 10 sec is too long, I can usually refresh page in 1-2 sec :)
Even Jetty Maven plugin has 3 secs default for scanning changes, which 
reasonable I think.
But if resourceWatcher will be run on each request that would be great because 
I don't have to guess to wich value to set that period.

Original comment by lystoc...@gmail.com on 10 Aug 2012 at 9:33

GoogleCodeExporter commented 9 years ago
Agree, the only problem with that approach would be that the first response 
will not be refreshed even if a resource has been changed.
In other words, you have the following flow:
- Request the resource
- The resource is missing in cache -> processing is triggered
- After 1 minute you make a new request
- The watcher is triggered, it detects the change but the response is taken 
from cache (it is stale)
- The next request will be up to date

Though it is not perfect, it might be reasonable during development. 

Another concern is this: what happens when you hammer you page with 10 requests 
per second? Should the ResourceWatcher check it for each request or only after 
a reasonable default period of time? 

Your feedback is really important to define the requirement for this feature.

Original comment by alex.obj...@gmail.com on 10 Aug 2012 at 9:45

GoogleCodeExporter commented 9 years ago
> first response will not be refreshed even if a resource has been changed

is it because processing happens on background unsynchronously in different 
thread, e.g. request threads don't wait until processing is fully completed and 
refreshed resource is available? Aren't you doing processing after server start 
on first request to resource? This doesn't require double refresh right? So I 
might don't understand the logic.

But desirable is:

When requested resource is detected to be changed, it is refreshed 
(re-processed) and served to browser as it was first request after server start.

In my case watcher is scanning for changes very fast (e.g. < 50-100ms), so it's 
fine for me if it will be scanning even every 0.5s or how often I will hammer 
the page. The main is to get expected refreshed resource after it is changed on 
request to it.

Maybe on some very large code bases where a lot of files are scanned for 
requested group resource it will take longer time so it might make sense to 
have separate property for refreshing on request. Separation at least could be 
useful in production when we don't want to spend additional resources on each 
user request.

Original comment by lystoc...@gmail.com on 10 Aug 2012 at 10:12

GoogleCodeExporter commented 9 years ago
The ResourceWatcher works asynchronously. This is the reason why it is 
impossible to guarantee that every request will fetch the latest processed 
resources unless the cache is disabled, which is inconvenient during 
development.

Scanning for resource changes each second is too often, given that 
ResourceWatcher is somehow expensive (depends on number of resources from 
model) because of the following reasons:
 - the resources are checked synchronously. If on average fetching 1 resource takes 10ms and you have 100 resources it might take up to 1 second.
 - fetching resources can be also expensive, depending on type of resources. For instance if most of your resources are external (ex: http://jquery.com/script.js), fetching this kind of resource might be slowed down due to network latency and connection speed.

This is the reason why making ResourceWatcher synchronous before the actual 
processing is done might be not a very good idea.

A better solution might be a trade-off between scheduling ResourceWatcher 
periodically and trigger it on demand when the page is requested. More 
specifically, 
using a reasonable default (ex: 3 seconds), the ResourceWatcher will not 
perform the check more often than 3 seconds, but in the same time will be 
triggered only by a page request. Thus, the following assumptions would be 
valid:
- The ResourceWatcher is not triggered unless a request is performed
- When no request is performed for a long period of time, the ResourceWatcher 
won't be invoked
- The first request taken after a long period of time will trigger the 
ResourceWatcher, but will fetch the stale resources from cache (since 
ResourceWatcher is asynchronous). However the next request will bring the 
latest changes (given that the ResourceWatcher managed to complete its job 
before that request).
- When the page is requested in a "hammer mode", the ResourceWatcher will be 
invoked only once after each 3 seconds.

This is a somehow detailed description of the trade-off solution, which might 
be a good starting point for improving the way ResourceWatcher is implemented. 
I'll create a new issue (or a thread on the mailing list) where I would like to 
hear your opinion. 

Original comment by alex.obj...@gmail.com on 11 Aug 2012 at 7:48

GoogleCodeExporter commented 9 years ago
Btw, there is an open issue which aims to improve ResourceWatcher performance 
by checking resources in parallel: 
http://code.google.com/p/wro4j/issues/detail?id=511

Original comment by alex.obj...@gmail.com on 11 Aug 2012 at 7:50

GoogleCodeExporter commented 9 years ago
The issue dedicated to ResourceWatcher changes where we can continue the 
discussion: issue514

Original comment by alex.obj...@gmail.com on 11 Aug 2012 at 8:03

GoogleCodeExporter commented 9 years ago

Original comment by alex.obj...@gmail.com on 22 Sep 2012 at 2:55

GoogleCodeExporter commented 9 years ago
Good news... there is a working version which will be available for next 
release (very soon).

Here is the commit link: 
https://github.com/alexo/wro4j/commit/ccfba2aa285b47eb67e69f3c4a3049aa91e64af6

It wasn't that hard after all :)

Original comment by alex.obj...@gmail.com on 25 Sep 2012 at 10:00

GoogleCodeExporter commented 9 years ago
Fixed in branch 1.4.x. 
The BuildContext is used now to detect incremental change. The changes are 
detected by persisting the fingerprint of each resource between builds using 
BuildContext.

Original comment by alex.obj...@gmail.com on 26 Sep 2012 at 9:03

GoogleCodeExporter commented 9 years ago

Original comment by alex.obj...@gmail.com on 26 Sep 2012 at 1:06

GoogleCodeExporter commented 9 years ago

Original comment by alex.obj...@gmail.com on 27 Sep 2012 at 12:24