anikesh / web-optimizator

Automatically exported from code.google.com/p/web-optimizator
0 stars 0 forks source link

Q: Does wbo check for last-modified headers? #85

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Textpattern supports last-modified header
2. Enable HTML cache
3. Get unchanged page after n seconds for HTML cache are gone

What is the expected output? What do you see instead?

I get a 200 tranfer from wbo.
I expect a 304 because I have activated last-modified haeders in Textpattern.

What version of the product are you using? On what operating system?

0.5.96

Please provide any additional information below.

I know the issue may be provocative for the pretty stupid (but good) full
page cache mechanismn inside wbo BUT for CMS which support last-modified
haeders wbo should receive those and pass them through to the browser
without further action.

So this is more a late night feature request than an issue :)

Original issue reported on code.google.com by merzmar...@gmail.com on 18 Sep 2009 at 9:00

GoogleCodeExporter commented 9 years ago
It's a good idea - to save headers among the pages content. I will think over 
how to 
implement it (don't want to have 2 files for each page -- only one is required).

Original comment by sunny.dr...@gmail.com on 18 Sep 2009 at 9:24

GoogleCodeExporter commented 9 years ago
Did you change Last-Modified behavior or is 'check mtime' broken?

I checked the pretty big page http://sankt-georg.info/tag and the time stamp 
for the
.gz file in wbo cache did not change.

Date    Sat, 26 Sep 2009 15:10:32 GMT
Server  Apache
Cache-Control   private, max-age=300
Content-Encoding    gzip
Etag    "d75390ba1f1551f6ab59ca56e4380732-gzip"
Expires Sat, 26 Sep 2009 15:08:11 GMT
X-Powered-By    PHP/5.2.11
Last-Modified   Mon, 21 Sep 2009 21:41:14 GMT
Content-Type    text/html; charset=utf-8

In the earlier wbo versions the files were newly created (cache HTML, 900 
seconds).

Reproduced with http://sankt-georg.info/fotografie

Date    Sat, 26 Sep 2009 15:15:58 GMT
Server  Apache
Cache-Control   private, max-age=300
Content-Encoding    gzip
Etag    "d5e58d22586693d6be12fc463297da1a-gzip"
Expires Sat, 26 Sep 2009 15:20:58 GMT
X-Powered-By    PHP/5.2.11
Last-Modified   Mon, 21 Sep 2009 21:41:14 GMT
Keep-Alive  timeout=2, max=200
Connection  Keep-Alive
Transfer-Encoding   chunked
Content-Type    text/html; charset=utf-8

Exactly the same Last-Modified header. Hmm, maybe it was set when I updated
Textpattern to 4.2.0.

Original comment by merzmar...@gmail.com on 26 Sep 2009 at 3:20

GoogleCodeExporter commented 9 years ago
The better question is:
Did you change Last-Modified AND/OR Etag logic or is 'check mtime' broken? 

But the homepage mtime is changing when I reload AND I don't get a 
Last-Modified header.

Date    Sat, 26 Sep 2009 15:30:49 GMT
Server  Apache
Content-Encoding    gzip
Etag    "ee0fb9e27f787a7e742b827c955e3855-gzip"
X-Powered-By    PHP/5.2.11
Keep-Alive  timeout=2, max=200
Connection  Keep-Alive
Transfer-Encoding   chunked
Content-Type    text/html

Original comment by merzmar...@gmail.com on 26 Sep 2009 at 3:37

GoogleCodeExporter commented 9 years ago
Markus, Web Optimizer doesn't send Last-Modified header. It tries to unset this 
header 
via .htaccess

Original comment by sunny.dr...@gmail.com on 26 Sep 2009 at 3:59

GoogleCodeExporter commented 9 years ago
The initial issue for posting comments 2+3 was that cached files were not 
refreshed
(which in some cases is totally OK).

No cache file update, no mtime change: /fotografie & /tag
Cache file update with mtime change: Homepage

Can you explain in detail (wiki entry?) in which situations how and when cache 
files
are updated? The documentation should allow better testing & understanding of 
logic
and wbo process dependencies.

Example:

Configuration (Full page cache): Cache HTML files on server for 900 seconds. 
Mtime
will be checked.
Case - Cached file is younger than 900 seconds: Etag sent > Etag found > 
Existing
file from cache is sent > No cache file update
Case - Cached file is older than 900 seconds: Etag sent > Etag found > Existing 
file
from cache is sent > No cache file update

This is what I see with 0.5.9.9.

This is what I think I saw in older versions:
Case - Cached file is older than 900 seconds: Etag sent > Mtime checked> File is
older than 900 seconds > Etag ignored > Request sent to CMS > New cache file is 
generated

Re. Last-Modified: Textpattern sends LM header and I was wondering.

PS: I will not change configuration or clean cache until Monday to test cache 
behavior.

Original comment by merzmar...@gmail.com on 26 Sep 2009 at 4:38

GoogleCodeExporter commented 9 years ago
ok, but i's a different issue, docs related. The initial one was about transfer 
headers 
from CMS :)

Original comment by sunny.dr...@gmail.com on 26 Sep 2009 at 5:01

GoogleCodeExporter commented 9 years ago
Yes, you are right. I mixed up browser communication & CMS/wbo/server responses.

CMS reacts with Last-Modified header > wbo logic has to decide if 200 process 
starts
or if simple/fast 304 response is sent.

Still the (extended) question stands up for this decision:

Can you explain in detail (wiki entry?) in which situations how and when cache 
files
are updated? And sent with a 200 response? Or sent with a 304 response? The
documentation should allow better testing & understanding of logic
and wbo process dependencies.

A flowchart would help :)

Original comment by merzmar...@gmail.com on 27 Sep 2009 at 11:24

GoogleCodeExporter commented 9 years ago
I think server side Caching logic must be significantly improved to fit a lot 
of 
systems. But it's a huge chunk of work. And it also needs a lot of research.

Issue about Cache Wiki page is here
http://code.google.com/p/web-optimizator/issues/detail?id=92

Original comment by sunny.dr...@gmail.com on 27 Sep 2009 at 11:32

GoogleCodeExporter commented 9 years ago
Yes, right again. But the big advantage of wbo being an external package is 
that you
can check for the standard communication interfaces offered by a CMS or not. 
Motto:
'not getting into the way' to avoid double work.

Example yes/no decision: Mtime check results in 'new page' request > CMS 
responses
with Last-Modified header & 304 > Is it necessary for wbo to play 'man in the 
middle'
and add Etag logic to the communication?

For sure this is more complicated (and beyond my standard http performance 
knowledge)
than just adding the generic wbo wrapper/buffer around the index.php output.

Modular concept for an interface check?

Maybe a future version of wbo can offer a browser communication test module? 
Like the
'server configuration'. Wbo mimics typical browser profiles and on manual 
request
fetches >=3 pages (homepage, article list page, single article page, ...) and 
shows &
analyzes the response headers.

Does this rough module draft make sense enough to become a new enhancement 
issue?

Original comment by merzmar...@gmail.com on 27 Sep 2009 at 12:16

GoogleCodeExporter commented 9 years ago
Markus, some functionality from webo.name analyzer (HTTP headers, load 
waterfall, 
etc) will be added to Web Optimizer in the future (to make it more comfortable 
to 
profile performance). But right now Web Optimizer is positioned as 'performance 
optimizer' not 'performance analyzer'. After we close most of issues about 
performance improvement we will start to integrate some performance analyze 
features.

But I don't want to force Web Optimizer to become a complicated tool that is 
able to 
do 'all in the world'. Maybe we will add a 'super premium' edition with 
analytic 
features for web professionals and developers.

Original comment by sunny.dr...@gmail.com on 27 Sep 2009 at 12:29

GoogleCodeExporter commented 9 years ago
Reviewed this issue:
1) A number of docs related issue have been already created (and some were 
closed), 
also a few blog posts about.
2) Performance Analyzer will be in Pro Edition (Spring 2010).
3) I don't see any need to transfer headers from CMS through Web Optimizer:
 a) If CMS gzipped content, ot's not parse via Web Optimizer
 b) CMS can't send 304 response to Web Optimizer because Web Optimzer doesn't request 
pages with conditional headers
 c) All server-side cache logic must be handled by CMS or by Web Optimizer. There 
won't be any mixed-responsibility solution. So Web Optimizer can handle 
server-side 
cache (with 304-answers via ETag) or can skip server side caching (CMS handles 
it).

So I don't see any actual issue here.

Original comment by sunny.dr...@gmail.com on 18 Nov 2009 at 4:43

GoogleCodeExporter commented 9 years ago
The only remaining issue is the server side cache performance.

Is it faster a) to request a page from the CMS, get back a 304 and deliver the
already cached page or b) is it faster to always request a fresh page when the 
cache
time has exceeded?

But yes, I agree to your "There won't be any mixed-responsibility solution" 
declaration.

Original comment by merzmar...@gmail.com on 18 Nov 2009 at 5:00

GoogleCodeExporter commented 9 years ago
hmm, if it's to re-request a page from CMS with 304 -- maybe... But Web 
Optimizer 
actually doesn't behave this way. It just run overall CMS logic to get HTML. 
Sometimes it's returned from cache (and Web Optimizer isn't care -- from cache 
or 
not). This is a standard content parsing way i.e. in Joomla! (Web Optimizer 
receives 
content from direct parsing or from any integrated cache engine -- this doesn't 
matter).

'304-answers' are valid for external requests. I.e. if Web Optimizer would be a 
valid 
PHP proxy (on a diferent port, like a squid / nginx / ligthtttpd) -- it 
requests a 
page and might handle 304-answers. But right now such answers are 'internal'. 
And 
this has been already integrated in all known ways (maybe there will be a 
plugin for 
Textpattern to provide deeper intagration and some performance increase).

Original comment by sunny.dr...@gmail.com on 18 Nov 2009 at 5:07