apache / incubator-pagespeed-mod

Apache module for rewriting web pages to reduce latency and bandwidth.
http://modpagespeed.com
Apache License 2.0
697 stars 159 forks source link

Server-side includes are stripped by remove_comments and rewrite_css #182

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Based on a report from torsten@tributh.net, it seems that server-side includes 
(mod_includes?) is running after mod_pagespeed.  To mod_pagespeed, server-side 
includes look like HTML comments so mod_pagespeed removes them if the site 
owner enabled remove_comments.

This can be made to work if we can figure out how to get mod_pagespeed to run 
*after* mod_includes.

Original issue reported on code.google.com by jmara...@google.com on 10 Jan 2011 at 1:57

GoogleCodeExporter commented 9 years ago

Original comment by jmara...@google.com on 10 Jan 2011 at 2:01

GoogleCodeExporter commented 9 years ago

Original comment by jmara...@google.com on 10 Jan 2011 at 2:01

GoogleCodeExporter commented 9 years ago
Hey,

Actually SSIs are processed by Varnish - so it will always execute after 
mod_pagespeed.

The simple solution is to disable mod_deflate when a page containing 
esi:include tags is found. This is not specific to the remove_comments filter 
but a general mod_ps/varnishd esi incompatibility.

CFR http://cd34.com/blog/infrastructure/no-esi-processing-first-char-not/

Original comment by robbie.g...@gmail.com on 10 Jan 2011 at 2:10

GoogleCodeExporter commented 9 years ago
OK; makes sense.  This the first I heard about ESI but we should take a look at 
those too.  To be clear, ESI is entirely distinct from SSI, the latter being 
processed by mod_include in Apache, and the former being processed in Varnish.  
Correct?

And they use different syntax as well: 

SSI:  <!--#include virtual="/footer.html" -->
ESI:  <esi:include>, <esi:remove> and <!--esi ... -->

Ideally mod_pagespeed would not see the SSI because they would be processed 
upstream.  However, if that proves impossible we could also teach mod_pagespeed 
about that special syntax (like it knows about IE directives and avoids 
removing those).

Original comment by jmara...@google.com on 10 Jan 2011 at 2:25

GoogleCodeExporter commented 9 years ago
yeah, this is very important for me.  i just tried mode_pagespeed and it broke 
all my pages for the reason described here. all my webpages make heavy use of 
SSI, e.g.

<!--#include virtual="/footer.html" -->

if mode_pagespeed always run upstream from server includes, then the 
remove_comments filter should leave all SSI comments untouched.

this is no limited to the comments like 
<!--#include virtual="/footer.html" -->

it should also recognize the other comments like
<!--#exec ... -->
<!--#if ... -->
<!--#endif -->
and all the other special comments handled by SSI.

i'm so frustrated that mode_pagespeed remove_comments filter is not compatible 
with using SSI :(

Original comment by loupi...@gmail.com on 9 Mar 2011 at 6:39

GoogleCodeExporter commented 9 years ago
also, to work around this problem (until it's fixed), i want to still enable 
the "remove_comments" filter on css and javascript files, but disable it on 
html files.

but there seems to be no way to do that. enabling / disabling filters appears 
to be global (i.e. for all types of files). and  ModPagespeedDisallow will 
globally disable all filters on some pages, which is not good either (i tried 
ModPagespeedDisallow on html files, and it prevents the re-writing of the css 
and js includes to use the cached versions, thus completely defeating the 
entive pagespeed module).

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:05

GoogleCodeExporter commented 9 years ago
another point that is important (at least to me): i use SSI also in javascript 
files and in css files.

so i can't even use the module rewrite_ JavaScript because when mod_pagespeed 
processes and minifies my javascript files, the SSI includes are expended and 
the cached version of the javascript includes the expended SSI includes.  
that's not at all suitable.

for example if the javascript file includes:
referrer = '<!--#echo var="HTTP_REFERER" -->';

i want this SSI include to remain as-is in the minified file served from the 
cache, because <!--#echo var="HTTP_REFERER" --> will be expended by the server 
to something different each time the script is loaded.

that's just an example, and i've got other similar cases with other env 
variables like URI_REQUEST, that change value at each request.

so mod_pagespeed is completely incompatible with any website that uses SSI, and 
i'm so sad :( 

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:26

GoogleCodeExporter commented 9 years ago
the idea mentioned in the title of this thread is BAD: if server side includes 
(SSI) were processed before mod_pagespeed, it would completely break because 
SSI can not only include other files, but also env variables, e.g. <!--#echo 
var="HTTP_REFERER" -->.

the pages with those includes expanded should NOT be cached! if SSI was 
processed before mod_pagespeed, mod_pagespeed would cache pages with env 
variables expanded, and that would completely break SSI.

the correct way is that mod_pagespeed should leave all the SSI "special 
comments" untouched in all the processed files (including js, css etc), and 
also, mod_pagespeed should make sure that any pages fetched from the 
mod_pagespeed  cache should be processed downstream by SSI.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 8:44

GoogleCodeExporter commented 9 years ago
Summary was: Make server-side includes work with remove_comments by tweaking 
order

Just to be clear, remove_comments only removes comments from HTML. rewrite_css 
may also remove comments from CSS. I don't believe anything removes comments 
from JavaScript.

You seem to be contradicting yourself above, in #6 you say that you still want 
to remove comments from CSS and JavaScript, but in #7 you say that you use SSI 
in both and can't have them stripped, which is it?

Right now if you don't want SSI stripped from HTML, you need to turn off 
remove_comments.

Original comment by sligocki@google.com on 9 Mar 2011 at 1:59

GoogleCodeExporter commented 9 years ago
Note that mod_pagespeed generally assumes that html content is not cacheable, 
so if you're only using server-side includes in html then you should have no 
issues with caching.  If you're using server-side includes in css, you might 
find it easier to simply include multiple css files in your html, and then use 
mod_pagespeed's combine_css filter to combine them.  I'd urge you strongly not 
to do user-agent- or referrer-based conditional inclusion anywhere except in 
html.

Original comment by jmaes...@google.com on 9 Mar 2011 at 3:28

GoogleCodeExporter commented 9 years ago
> ou seem to be contradicting yourself above, in #6 you say that you still want 
to remove comments from CSS and JavaScript, but in #7 you say that you use SSI 
in both and can't have them stripped, which is it?

i want to remove the CSS comments from the CSS (e.g. /* comments */
and i want to remove javascripts comments from the javascript
e.g.

// comment
and also /* comments */

but i do NOT want any of my SSI include html-style comments to be touched in 
any way, whether they appear in html files, in css files, in javascript files, 
or in any other type of file that is subject to SSI processing.

SSI-type comments look like:

<!--# ... --> and they can appear in any type of file that is processed by the 
SSI module, and that include css, javascript and other files, not just html.

> Right now if you don't want SSI stripped from HTML, you need to turn off 
remove_comments.

i did that, but clearly it's not enough: the caching issue still breaks 
everything, because the pages that are cached do have my SSI include expanded, 
and that include javascript files.

for example, try a javascript file (.js) with:

alert('this is my user agent: <!--#echo var="HTTP_USER_AGENT" -->');

you will see that the same user-agent will be displayed regardless of the 
actual browser you use to load the page.  that's because mod_pagespreed will 
cause the .js file to be cached AFTER the SSI has been expended, so when 
another person accesses the file with another user-agent, the page that will be 
served will contain something like:

alert('this is my user agent: Mozilla 5.0 (compatible [...]');

and the SSI processing will NOT happen because the SSI comment is not in the 
mod_pagespreed  cached page anymore.

the problem here is that mod_pagespreed caches the pages AFTER SSI has been 
processed.  it should cache pages BEFORE SSI is processed, because SSI should 
happen on all mod_pagespreed cached pages.

also, i noticed another problem: when remove_comments is disabled, it appears 
to not remove the /* comments */ from css files, even in rewrite_css is used. i 
am not sure if this is by design or not.

in any case, because of the cache issue, even if i disable the remove_comments, 
it completely breaks by site, as SSI are not processed dynamically on each file 
that is accessed.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 5:46

GoogleCodeExporter commented 9 years ago
> I'd urge you strongly not to do user-agent- or referrer-based conditional 
inclusion anywhere except in html.

i use it in javascript for very good reasons: there are implementations of 
javascript that do not give access to of the env variables that the server has 
access to, like user-agent, referrer, etc, and in some cases it is extremely 
useful to use SSI in javascript.

but in any case, the problem would be the same to html: the caching of the 
pages should always happen BEFORE any SSI include is processed - and SSI 
comments should remain completely untouched in any type of file filtered or 
processed by mod_pagespeed.

i think if those two conditions were true, mod_pagespeed would work well in 
combination with SSI.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 5:51

GoogleCodeExporter commented 9 years ago
so, if i understand well, if i put all my javascript snipplets that use SSI in 
in-line scripts blocks in html files, since html won't be cached, then it 
should allow me to have the minififyed javascript (cached) working.

but still, my HTML files have very significant amount of comments, and i really 
want them to be stripped (but NOT the SSI comments!), and i never want any page 
with SSI to be cached, unless the caching occurs before the SSI comments are 
processed.

it might be worth a try, but using mod_pagespeed only for javascript 
minification might not be worth the effort in the case of my site.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 6:05

GoogleCodeExporter commented 9 years ago
Hi all, I intend to pick up this issue again soon.  I will review all the 
information in this thread.  But for now:

remove_comments applies only to HTML comments.  We could potentially teach 
remove_comments to leave in the SSI syntax, but I'd prefer making SSI run 
upstream of mod_pagespeed.

All HTML coming out of mod_pagespeed should be marked un-cacheable.  If it is 
coming out of your server marked as cacheable then either (a) you have a very 
old version of mod_pagespeed (b) you are doing sometthing very unnatural in 
your apache configuration to defeat mod_pagespeed here or (c) we have a bug and 
I need to know about it in more detail.  If (c) please open a new issue with a 
detailed description of your apache configuration.  [note: at some point in the 
future we may allow mod_pagespeed's HTML output to be cacheable but today we 
always mark it uncacheable]

rewrite_css should remove CSS comments, but will only touch files that are 
marked as cacheable.

Similarly, Javascript files should only be touched if they are marked as 
cacheable.  If they are using SSI then they should probably not be marked 
non-cacheable.

rewrite_javascript indeed removes JS comments.  It is not sensitive to SSI.  
This will not matter if SSI runs before mod_pagespeed.

Original comment by jmara...@google.com on 9 Mar 2011 at 6:38

GoogleCodeExporter commented 9 years ago
> remove_comments applies only to HTML comments.  We could potentially teach  
remove_comments to leave in the SSI syntax, but I'd prefer making SSI run  
upstream of mod_pagespeed.

that is really the WORST solution in my opinion, especially is caching is 
involved.

if SSI was done downstream, and the SSI comments were always left untouched, 
and cache pages were cached before any SSI processing, then everything would 
work just fine.

you should really think of all the consequences.  many people use SSI in 
various way and for various purposes, and they all rely on it to work exactly 
as advertised. whatever mog_pageview does, it should not break sites using SSI 
(whether in html, in scripts, or in any other pages).

another reason we use SSI in javascript is to make conditional code based on 
the SERVER_HOST env variable, for example. so even if minified and cached, the 
selection of the code in the javascript file is based on a server-side SSI 
environment variable test, and this test should always work. we rely on that.

i'm sure many other people use SSI in various other ways and for also very good 
implementation reasons. you cannot say: don't use SSI in javascript files, for 
example.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 6:47

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
oops, i meant HTTP_HOST, not SERVER_HOST.

e.g. many of our .js files contain things like:

<!--#if expr="\"${HTTP_HOST}\" = /one_of_my_host_name/" -->

some javascript code that should run only when the request is on that server

<!--#endif -->

(note: our scripts are run on different domains, that are all served by the 
same server)

Original comment by loupi...@gmail.com on 9 Mar 2011 at 6:52

GoogleCodeExporter commented 9 years ago
Thanks for all the commentary about SSI.  One thing that would really help is 
if you could provide kind of a minimal web-site-in-a-tarball containing 
examples of how you use SSI in JS and HTML.  We'd then try to build a testcase 
out of those and define this bug as fixed when that testcase worked.

I still don't understand why you want mod_pagespeed to run upstream of SSI.  
Either way, we need to prevent caching of mod_pagespeed-generated HTML.

I think that we will not be able to rewrite any JS that has server-side 
includes because the result would presumably vary.  In the absence of 
mod_pagespeed, are you serving cachable JS that is generated differently 
depending on user-agent?  If so you would probably want to mark that with 
Cache-Control:private which allow browser-caching but prevent proxy-caching.  
It will also prevent mod_pagespeed from optimizing the resource, which would be 
the right thing from a functional perspective.  If you are using SSI to 
generate cacheable Javascript that varies based on something other than 
user-agent then I don't see how that can work -- the user's browser would old 
Javascript cached from server A even when served...well I guess it wouldn't be 
re-served if it was cached.  But you get the idea.

I guess I need to understand the scenario and what you are trying to achieve at 
a higher level.

You can work around these thorny issues by evaluating the SSI in HTML (which in 
general we would not allow to be cached), and passing that to the cacheable JS:

HTML:   <script>window.server_host = <!--#echo var="SERVER_HOST" -->';</script>
JS:     if (window.server_host == "one_of_my_host_name") {
           ....

You can then allow open caching of the JS file, thus enabling mod_pagespeed to 
rewrite it.

Original comment by jmara...@google.com on 9 Mar 2011 at 7:13

GoogleCodeExporter commented 9 years ago
> I still don't understand why you want mod_pagespeed to run upstream of  
SSI.  Either way, we need to prevent caching of mod_pagespeed-generated  
HTML.

that's because it would allow caching of minified javascript that are using SSI.

e.g. in the cache you could have .js files that look like:

<!--#if expr="\"${HTTP_HOST}\" = /one_of_my_host_name/" -->
some minified javascript code
<!--#else -->
some other minified javascript code
<!--#endif -->

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:24

GoogleCodeExporter commented 9 years ago
> You can work around these thorny issues by evaluating the SSI in HTML (which 
in general we would not allow to be cached), and passing that to the cacheable 
JS:

yes, of course, but this would involve modifying dozens or hundreds of scripts, 
and this is a ad-hoc thing and you cannot expect all the people using SSI to go 
through that.  not to mention the chance of introducing bugs.

the solution to the problem should involve a minimum modification to the 
existing files (html, js etc) and yet allow mod_page to work.

large website  cannot just rewrite hundreds of pages of html and scripts just 
to work-around a bug or shortcoming of some apache module. this is not 
reasonable to assume.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:27

GoogleCodeExporter commented 9 years ago
When you say this:
   in the cache you could have .js files that look like:

  <!--#if expr="\"${HTTP_HOST}\" = /one_of_my_host_name/" -->
  some minified javascript code
  <!--#else -->
  some other minified javascript code
  <!--#endif -->

Which cache are you talking about, in the absence of mod_pagespeed?  What about 
in the presence of mod_pagespeed?  I'm not understanding your caching strategy, 
and I'm still not understanding how the ordering of mod_include, which runs as 
an Apache output filter, and mod_pagespeed's output filter affect this.

FYI mod_pagespeed's resource-serving path is completely unrelated to the 
output-filter ordering. It fetches resources using an HTTP fetch and stores 
them in a server-side cache.  It serves those resources via an output-generator 
and puts a long cache lifetime on them so it's inappropriate to do SSI on them 
at that stage.

I really would like to understand your caching strategy in the absence of 
mod_pagespeed, so I determine whether we can do something that's consistent 
with that.

Original comment by jmara...@google.com on 9 Mar 2011 at 7:41

GoogleCodeExporter commented 9 years ago
Which cache are you talking about the mod_pagespeed cache! i'm not talking "in 
the absence of mod_pagespeed".

the only other caching strategy i use is private cache (i.e. on the browser).

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:45

GoogleCodeExporter commented 9 years ago
> FYI mod_pagespeed's resource-serving path is completely unrelated to the  
output-filter ordering. It fetches resources using an HTTP fetch and stores  
them in a server-side cache.  It serves those resources via an  
output-generator and puts a long cache lifetime on them so it's  
inappropriate to do SSI on them at that stage.

this would work fine if, when it fetches the pages, the SSI was disabled, and 
if, when it serves the pages, SSI could be run on the pages from its cache.

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:48

GoogleCodeExporter commented 9 years ago
by the way, is there a forum moderated by google employees to discuss other 
issues related to mod_pagespeed.

for example, many websites use google adsense advertising. the adsense policy 
states that modifying the adsense code provided by google (generally that's 
javascript) is not allowed.

would adsense allow minifying javascript files or filtering web pages that 
include some adsense code? technically this involved modifying the adsense 
code, but of course functionally it should not change anything. i'd like to 
have some official statement from google adsense about that. i would hate to 
see my adsense account be cancelled just because i'm trying to optimize serving 
speed using mod_pagespeed which is a project supported by google. 

Original comment by loupi...@gmail.com on 9 Mar 2011 at 7:56

GoogleCodeExporter commented 9 years ago
The discussion group is http://groups.google.com/group/mod-pagespeed-discuss

mod_pagespeed currently does not support privately cached resources.  
Specifically, if your JS files are served with HTTP header "Cache-Control: 
private" then mod_pagespeed will leave them alone.  So SSI should continue to 
work and continue to be cached privately.

More specifically:  mod_pagespeed today should fully support you if you (a) 
marked your privately cacheable resources as such and (b) do not enable 
remove_comments, which is off by default.  You can, if you like, enable 
rewrite_css and rewrite_javascript which will minify any css and js files that 
are publicly cacheable.

Can you provide a URL to your site?

Original comment by jmara...@google.com on 9 Mar 2011 at 8:30

GoogleCodeExporter commented 9 years ago
loupiote.com

not sure that mod_pagespeed would ever work for me. my html files have lots of 
comments AND they use SSI.

my js files currently also use SSI to select code based on domain. i'll check 
if this can be changed but it would involve significant work. if SSI could be 
performed after the mod_pagespeed cache is retrieved, then everything should 
work just fine, i think...

Original comment by loupi...@gmail.com on 9 Mar 2011 at 8:35

GoogleCodeExporter commented 9 years ago
We can easily make some changes to resolve the comment-removal issue, either:

1. Change our remove_comments filter to leave SSI directives alone
2. Change mod_pagespeed to run after mod_include

I still favor #2.  And I took a quick look; it shouldn't be too hard.  I was 
just trying to figure out why you didn't like #2.

But if you could disable the "remove_comments" filter for now, and verify that 
your site is working with mod_pagespeed, then we'd know we've wrapped our arms 
around this problem.

Granted this is still not the optimal solution which I think for you involves 
mod_pagespeed supporting non-publicly cacheable files.  That seems like a bad 
idea for two reasons:

1. If the data is really private we'd need to use a cache-key that made it 
unique to the user and this is itself complex.
2. If it's private because it's user-agent specific or domain-based or 
something else we'd have to put that in the cache-key, which seems do-able but 
I'd be concerned that we'd blow out cache capacity with nearly identical copies 
of the same resource.

The reason to factor out the user-specific stuff from your shared resources is 
so that they can be accelerated by CDNs, proxy caches, etc.  mod_pagespeed 
would benefit for the same reasons.

I think all of that is rather independent of the decision about how to order 
SSI and mod_pagespeed.

I took a look at http://www.loupiote.com/js/all.js and see that it's cached 
with Cache-Control:max-age=3600, so it's publically cacheable.  But it's got 
this text in it:

try {
var loupiote_debug = 0;

try {
if ('74.125.60.1' == '66.127.52.190') {
    loupiote_debug = 1;
}
} catch (err) {}

I think this is not what you want for a publicly cacheable resource.  My ISP 
can cache this resource in exactly this state and apply it to every browser: 
mobile, IE6, Chrome, Safari, all the different cases.  If you really want to 
generate this conditionalization with server-side includes then you need to 
either prepend "private, " to your Cache-Control header, or add header 
Vary:User-Agent.   In fact I don't totally understand how that's going to 
really do what you want either, because even on one browser you may access the 
same resource from multiple distinct domains.  I suppose something like 
Vary:Referer,User-Agent might help.  I'm not sure what else you'd need.

The point is, mod_pagespeed behaves like a proxy cache that's very close to 
your server.  For mod_pagespeed to safely rewrite a resource it must be safe to 
serve as is to all your users.   One of the main benefits of mod_pagespeed is 
long cache lifetimes for resources, but if we let SSI in our resource stream 
then we'd have to disable the public cachability of our output.

Original comment by jmara...@google.com on 9 Mar 2011 at 9:07

GoogleCodeExporter commented 9 years ago
yes, you caught a piece of code that i use to debug (based on my IP address), 
and when i do use that, i disable caching in the server. not a very good 
technique, but it should be mostly harmless because my IP address is unlikely 
to be used by someone else, since it usually stays the same for days.

if i use "private" in the Cache-Control, this has no effect on Google search 
engine caching, correct? this is only affecting proxy / ISP caching, right?

i'll do that.

another question: i have a lot of in-line javascript snippets in my html files. 
 is  mod_pagespeed (with javascript_rewrite)able to minify all the in-lined 
javascript in html files?

Original comment by loupi...@gmail.com on 9 Mar 2011 at 9:44

GoogleCodeExporter commented 9 years ago
Yes mod_pagespeed will minify inline javascript and also inline css, if you've 
enabled rewrite_css and rewite_javascript.

I'm not sure what you mean by "search engine caching".  Do you mean the 
"cached" link that shows up on the Google search results page?  I'm not sure 
about that; FWIW nytimes.com marks most of its css privately cacheable.  See:

  wget --save-headers -O - http://css.nyt.com/css/0.1/screen/build/homepage/styles.css?v=011411 | head 

BTW you can leave your images (which I think are on flickr anyway) publically 
cacheable.  I'd just be concerned about your SSI-affected js/css.

Original comment by jmara...@google.com on 9 Mar 2011 at 9:54

GoogleCodeExporter commented 9 years ago
ok.

yes, i meant caching by search engines, like the google "cached" version of the 
page.  i'm pretty sure "private" would have no effect on that, since it is not 
like proxy caching i.e. the cahed page is not fetched with it's normal URL.

i serve copies of my flickr images that are hosted on my server, but i agree 
with your remark.

i'll see if i can remove all the SSI code from inside .js script (and just 
leave it in the html files). then at least i should be able to minify my 
javascript and things should work.

the next step would be to have a solution to remove the comments in the html 
files without breaking the SSI, and tell mod_pagespeed that it should never 
cache html files (but that it should remove comments and spaces).

maybe we'll be able to make that work...

Original comment by loupi...@gmail.com on 9 Mar 2011 at 10:23

GoogleCodeExporter commented 9 years ago
Yes -- I think that's the open bug here -- and it shouldn't be hard to fix.  
There are a couple of Apache-trivia details that need to be sorted out.

Original comment by jmara...@google.com on 9 Mar 2011 at 10:53

GoogleCodeExporter commented 9 years ago
in trying to make it work with just script minification, i found this problem:

mod_pagespeed strips the SSI comments in my inline scripts (in the HTML files).

e.g.

<script>
<!--#include virtual="some file with javascript that should be in-lined here" 
-->
my-javascript;
</script>

becomes:

<script>
my-javascript;
</script>

so of course everything breaks.

i want to be able to use SSI in my HTML files, even inside in-line scripts. why 
is that not possible?

Original comment by loupi...@gmail.com on 9 Mar 2011 at 11:34

GoogleCodeExporter commented 9 years ago
Just to clear, do you have remove_comments enabled still?  If so you can you 
disable it and try again?

In any case this will work, even with remove_comments, once we manage to get 
mod_pagespeed to run *after* mod_includes.  Actually I wonder if you can do 
that in your .conf files somehow.  If you want to hack on that, go for it.  
Currently the pagespeed.conf we auto-install has this:

    AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html

But mod_include.c has this code in it:

  ap_hook_fixups(include_fixup, NULL, NULL, APR_HOOK_LAST);
...
include_fixup(...) {...
  ap_add_output_filter("INCLUDES", NULL, r, r->connection);
...}

Evidence suggests that this combination results in mod_includes being run 
*after* modpagespeed.  But please try adding, as an experiment, 

   AddOutputFilterByType INCLUDES text/html

right *before* the line 

   AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html

in pagespeed.conf.  Maybe this will simply solve the problem (at least for 
you).  We could then look at how we can change our apache init code to move the 
mod_pagespeed filter after the includes filter.

Original comment by jmara...@google.com on 9 Mar 2011 at 11:45

GoogleCodeExporter commented 9 years ago
my config is:

        <IfModule pagespeed_module> 
                  ModPagespeed off
                  ModPagespeedUrlPrefix                "http://www.loupiote.com/mod_pagespeed/"
                  AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html

                  ModPagespeedFileCachePath            "/var/mod_pagespeed/cache/" 
                  ModPagespeedGeneratedFilePrefix      "/var/mod_pagespeed/files/" 

                  ModPagespeedEnableFilters rewrite_javascript
                  ModPagespeedDomain loupiote.com 
        </IfModule> 

i don't have remove_comments enabled (the other comments in the HTML page are 
still there, and the SSI includes are performed in the HTML code (but not in 
the inline <script> blocks, there they just seem to disappear!).

i will try with adding 

AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html

Original comment by loupi...@gmail.com on 9 Mar 2011 at 11:51

GoogleCodeExporter commented 9 years ago
another thing that i noticed is that unless i put explicitly

ModPagespeed off

something happens that beaks my pages.  therefore, unlike other modules, the 
default is not "off" when this one is installed. i.e. even without 
"ModPagespeed on", mod_pagespeed has an effect and that can cause pages to 
break...

Original comment by loupi...@gmail.com on 9 Mar 2011 at 11:54

GoogleCodeExporter commented 9 years ago
ooops, well, i ALREADY had MOD_PAGESPEED_OUTPUT_FILTER , so should i try 
removing it?

Original comment by loupi...@gmail.com on 9 Mar 2011 at 11:55

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
i tried with "AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html" and 
without, it makes no difference :(

note: the SSI include that i do is of file with .js extension. and i think that 
SSI include will do a server fetch of the included file, since other recursive 
SSI includes are processed correctly in the included file. so i don't know what 
page_modspeed does when the apache servers does an internal fetch of a .js 
script (triggered by a SSI), in case page_modspeed rewrite_javascript is 
enabled. 

Original comment by loupi...@gmail.com on 10 Mar 2011 at 12:05

GoogleCodeExporter commented 9 years ago
sorry, i meant:

       <IfModule pagespeed_module> 
                  ModPagespeed on
                  ModPagespeedUrlPrefix                "http://www.loupiote.com/mod_pagespeed/"
                  AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html

                  ModPagespeedFileCachePath            "/var/mod_pagespeed/cache/" 
                  ModPagespeedGeneratedFilePrefix      "/var/mod_pagespeed/files/" 

                  ModPagespeedEnableFilters rewrite_javascript
                  ModPagespeedDomain loupiote.com 
        </IfModule> 

(of course when i test, i set "ModPagespeed on" - i just copied after restoring 
to something that works, i.e. OFF!!!)

Original comment by loupi...@gmail.com on 10 Mar 2011 at 12:07

GoogleCodeExporter commented 9 years ago
and my html file contains:

<script>
<!--#include virtual="/include/my_inline_include.js" -->
my-javascript;
</script>

the SSI include works when mod_pagespeed is OFF.

when it's ON with the config i posted, the SSI include is not done in the 
inline script.

Original comment by loupi...@gmail.com on 10 Mar 2011 at 12:10

GoogleCodeExporter commented 9 years ago
[this topic about "something happens that breaks my pages" is totally unrelated 
to server-side includes.  is that right?  can you open a new issue for that and 
describe what you are seeing?]

You are right: ModPagespeed defaults to 'on'.  We thought that if you did 
'LoadModule' or (on Ubuntu) added the sym-link from 
../mods-available/pagespeed.load to  ../mods-enabled/pagespeed.load that you'd 
want to enable it.

What I was suggesting above is that you have this in your .conf:

   AddOutputFilterByType INCLUDES text/html
   AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html

I'm hoping (but have not confirmed) that this will cause mod_include to run 
upstream of mod_pagespeed.

It looks to me like we are removing <!-- ... --> comments in rewrite_javascript 
as well, so hopefully that will be fixed as well once we get the filter order 
correct.

Original comment by jmara...@google.com on 10 Mar 2011 at 12:15

GoogleCodeExporter commented 9 years ago
still not working.

my config now is:

        <IfModule pagespeed_module>
                  ModPagespeed on
                  ModPagespeedUrlPrefix                "http://www.loupiote.com/mod_pagespeed/"

                  ModPagespeedFileCachePath            "/var/mod_pagespeed/cache/"
                  ModPagespeedGeneratedFilePrefix      "/var/mod_pagespeed/files/"

                  ModPagespeedEnableFilters rewrite_javascript

                  ModPagespeedDomain loupiote.com

                  AddOutputFilterByType INCLUDES text/html
                  AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html
        </IfModule>

> You are right: ModPagespeed defaults to 'on'.

i don't think it's a good idea. most other apache modules default to off, so 
that's not consistent.

> It looks to me like we are removing <!-- ... --> comments in 
rewrite_javascript as well,

yes, well, at least in the inline scripts.  but on the other hand, my "all.js" 
script looks like:

try {
<!--#include virtual="/js/debug.js"-->
} catch (err) {debug_alert('x=catch-all-11-' + err);}
try {
<!--#include virtual="/js/ajax.js"-->
        } catch (err) {debug_alert('x=catch-all-1-' + err); loupiote_ie6 = 1;}
try {
<!--#include virtual="/js/addthis-init.js"-->
} catch (err) {loupiote_ping_async('x=catch-all-10-' + err);}
try {
<!--#include virtual="/js/set-defaults.js"-->
        } catch (err) {loupiote_ping_async('x=catch-all-1');}
try {
<!--#include virtual="/js/get-uri-query-string.js"-->
} catch (err) {loupiote_ping_async('x=catch-all-14-' + err);}
etc...

and it appears that SSI include are done correctly BUT the resulting all.js 
script is NOT minified (whereas the inline scripts in my html files are 
minified).

any idea why my all.js is not minified?

Original comment by loupi...@gmail.com on 10 Mar 2011 at 12:34

GoogleCodeExporter commented 9 years ago
You probably want
   ModPagespeedDomain *loupiote.com
rather than
   ModPagespeedDomain loupiote.com
as the latter form does not authorize http://www.loupiote.com/js/all.js.  I'm 
not entirely satisfied with that answer, however, because your home page is 
www.loupiote.com which is implicitly authorized.

So there's something else going on that isn't obvious to me right now.  If you 
turn on 'loglevel info', mod_pagespeed will be very verbose about what it's 
trying to do, and may print something about 'all.js'.  You'll want to leave 
'loglevel info' on only while you investigate this because we'll happily fill 
your disk for you if you leave it on indefinitely :)

RE "ModPagespeed defaulting to off" -- you can report that as a separate issue 
but I think we're not likely to change it as there are many thousands of sites 
with it installed as is and an incompatible change like that doesn't seem like 
a good plan :)  We try very hard to avoid breaking existing config files with 
our code updates.

By the way, I am not seeing X-Mod-Pagespeed headers on www.loupiote.com so I'm 
wondering if you are testing this on a different home page.  In that case it 
could be a domain-authorization issue, depending on the origin of your 
alternate home page.

Original comment by jmara...@google.com on 10 Mar 2011 at 12:45

GoogleCodeExporter commented 9 years ago
is there a way to completely turn OFF all caching done my mod_pagespeed, so 
that i am sure that what i see is not in fact an old cached version that was 
generated by a different configuration?

yeah, the defaulting to on is unfortunate but probably too late to change.

regarding the issue with SSI, i think this should be documented early on in the 
mod_ web page documentation(s), and it should be made clear so the people don't 
spend time pulling their heads over what's happening in case they use SSI.

Original comment by loupi...@gmail.com on 10 Mar 2011 at 1:43

GoogleCodeExporter commented 9 years ago
> By the way, I am not seeing X-Mod-Pagespeed headers on www.loupiote.com so 
I'm wondering if you are testing this on a different home page.  In that case 
it could be a domain-authorization issue, depending on the origin of your 
alternate home page.

that's because i turned mod_pagespeed off (i have to turn it off until i can 
make it work).  i just turn it on briefly for testing. i guess i should make a 
test page that is authorized, and unauthorize mod_pagespeed on all the other 
pages, using the filters. i'll try that.

Original comment by loupi...@gmail.com on 10 Mar 2011 at 1:46

GoogleCodeExporter commented 9 years ago
making some progress (i think), but i need help understanding why some files 
are not seen at all by mod_pagespeed.

I have set LogLevel info, so i see the mod_pagespeed messaged in my error_log 
when a file is processed.

here is my config, and i'll leave it "on" (active), so you can try too.

        <IfModule pagespeed_module>
                  ModPagespeed on
                  ModPagespeedUrlPrefix                "http://www.loupiote.com/mod_pagespeed/"

                  ModPagespeedFileCachePath            "/var/mod_pagespeed/cache/"
                  ModPagespeedGeneratedFilePrefix      "/var/mod_pagespeed/files/"

                  # disable CoreFilters:                                                            
                  ModPagespeedRewriteLevel PassThrough                                              

                  ModPagespeedEnableFilters rewrite_javascript

                  ModPagespeedDomain *loupiote.com

                  AddOutputFilterByType MOD_PAGESPEED_OUTPUT_FILTER text/html

                  ModPagespeedDisallow *
                  ModPagespeedAllow *.js
                  ModPagespeedAllow *833171367.shtml
        </IfModule>

the goal is to process only the html file 833171367.shtml (for testing), and 
all the .js files.

the html file is processed, i can see some info messages in the error_log, and 
i can see the
X-Mod-Pagespeed: 0.9.15.3-404
in the header when i run:

$ curl -D - -o /dev/null http://www.loupiote.com/photos/833171367.shtml

but the .js files are not processed: no message in the error_log, and no 
X-Mod-Pagespeed header.

e,g, 

$ curl -D - -o /dev/null http://www.loupiote.com/photos/833171367.shtml

====> so my first question here is: why this config does not appear to process 
the .js files at all?

regarding the HTML, this config appears to not break the in-line scripts (the 
SSI is correctly included in the in-line scripts), but it does somehow modify 
the html causing a visual difference in the page layout (or some inline 
javascript has been tampered with). not sure why, since the core filters are 
disabled.

the effect of mod_pagespeed on this page (with Chrome) is one "blank" line 
added just above the photo, under the "Change image size" line, and within the 
orange frame that appears when hovering. if you go to any other photo page on 
my site (click "random photo" in the menu), you will see that this "blank" line 
is not there.

i'm not sure exactly what causes it (yet), but this is apparently a bug i.e. 
mod_pagespeed modifies something that it should not modify...  

Original comment by loupi...@gmail.com on 10 Mar 2011 at 4:20

GoogleCodeExporter commented 9 years ago
the effect on the page layout is, once again, related to the use of SSI in my 
html.

the page served by mod_pagespeed contains:

<div id="photo-page-image-container">title="Click to download image"
 >

the source page (before SSI) contains:

<div id="photo-page-image-container" <!--#include 
virtual="/include/photo/photo-page-image-container-attributes.shtml"--> >

yes, i know, it looks bad, but the SSI part is normally replaced by :

title="Click to download image"

so the page served by apache is syntactically correct:

<div id="photo-page-image-container" title="Click to download image" >

but for some reason, mod_pagespeed modifies the page before processing the SSI. 
 i think it closes that div tag because it thinks it was not closed when it 
sees the SSI comment.

i.e. it changes

<div id="photo-page-image-container" <!--#include 
virtual="/include/photo/photo-page-image-container-attributes.shtml"--> >

into:

<div id="photo-page-image-container"> <!--#include 
virtual="/include/photo/photo-page-image-container-attributes.shtml"--> >

then the SSI is processed, causing:

<div id="photo-page-image-container">title="Click to download image"
 >

instead of:

<div id="photo-page-image-container" title="Click to download image" >

if SSI is processed before mod_pagespeed, it should be processed upfront from 
any modification by mod_pagespeed.

that's because the html may be syntactically incorrect before SSI, but 
perfectly correct after SSI.

so any parsing and filtering by mod_pagespeed should be done after SSI, in 
order to prevent this sort of problem.

do you agree with my analysis there?

Original comment by loupi...@gmail.com on 10 Mar 2011 at 4:39

GoogleCodeExporter commented 9 years ago
i discovered another problem: mod_pagespeed overwrites my Cache-Control 
headers, so i had to disable it until i can find a solution to that 
Cache-Control issue, too.

see:
http://code.google.com/p/modpagespeed/issues/detail?id=232&can=4&colspec=ID%20Ty
pe%20Status%20Priority%20Milestone%20Modified%20Owner%20Summary

Original comment by loupi...@gmail.com on 10 Mar 2011 at 7:49