Closed fanglingsu closed 10 years ago
@semarie If no content-security-policy is set you remove the Content-Security-Policy header (8f3379a). In my opinion this should be kept, if there is a webpage that provides this nice header, than we should use it. I see the fake of the header mainly to forcee a security policy if this is not provided by the server, or to extend a provided policy (by adding additional headers).
No it isn't exactly that. These headers are removed before the server has added headers. So any header that come from server are keeped.
If I remember well, I do that because SoupSession seems reuse SoupMessage in some cases. So whereas I set :set content-security-policy=
the header remains. It was more simple to remove or replace in all cases.
No it isn't exactly that. These headers are removed before the server has added headers. So any header that come from server are keeped.
OK, I didn't knwo that.
If I remember well, I do that because SoupSession seems reuse SoupMessage in some cases.
This sounds strange for me, but maybe this is a w3 specified behaviour for some kind of responses like Range-Responses (you can try it with downloads on page http://radiopanel.narod.ru/collection/vef214.htm) where the response is split into multiple pieces. I can image that it would be useful or required to reuse parts of the first responses headers for the following responses to the same request.
This sounds strange for me
I redo some tests, but I could get back a problem... so currently I test with an abort() if any header are setted.
Maybe it was a problem in a configuration file when I test it before... So for the moment, I assume soup_message_headers_append
is enought (and if not, the abort() will do his job :) ).
Just some thoughts...
One possibility is to try to optimize autocmd by using compiled pattern for example (but full performance is more complex than just call g_pattern_spec_new and g_pattern_match_string, we should call g_pattern_match and stock uri, reversed-uri and uri-length to not recalculate it at each autocmd place).
That was my first thought to use glib's pattern matching, but I found this to be too much regex and decided to implement a more shell like matching by myself. I'm not sure if the glib matching would be provider better performance in case of the autocmd.
Oh, I haven't realized that the matching isn't done with glib.
I have done some timing to see the difference between util_wildmatch, g_pattern_match_string and g_pattern_match. For the moment, I haven't see any difference between them (diff is in range of 500ns).
I think that the headers may not disturb anything. But I thought that we need an autocmd event for request-queued to get rid of redirects. In this case the autocmd would be fired for each resource on the page and not only for the page itself. [...] But this does not work for redirect, or do I misunderstand this? And this will also not work for frames in the page. For example if trusted.org contains a frame to something.org and I've rules defined for something.org, this should also be applied for the content shown in the frame. The LoadProvisional is only called for the main page, but not for the frames
You are right: LoadProvisional
have the drawback to mask frames or redirect. So having autocmd event for request-queued
is what to do.
There is some code in my branch, if you want take a look.
I'm not luck with the automcd for request-queued. This autocmd are called multiple times for each resource on a page. That means that we check if there is a matching automcd for the request. If there are one or more autocmd for a domain will this lead to multiple calls of :set ...
with always the same value. I think this is a show stopper for vimb.
On the other side may users define a strict content-security-policy global without using autocmd to make exceptions.
I see three ways to go further.
:content-security-policy http{s,}://*.github.com/* 'self' 'unsafe-inline' 'unsafe-eval' data: https://*.github.com/ https://*.githubusercontent.com/
. This avoid to call :set ...
multiple times for a setting that is used only in place where the autocmd is called.:auto-response-header http{s,}://*.github.com/*,https://*.google.com/* Content-Security-Policy=...
I think we need to quantified the overhead to add content-security-policy, in order to choose the best path.
Currently, the patch do:
For the signal: as request-queued
is used only for autocmd
, it should be possible to connect it only if RequestQueued
autocmd is used. So if someone don't use it, the signal is fired: no overhead. Else, using the autocmd event have a cost...
For autocmd: there is an early cut-off with usedbits
. So call autocmd_run
with an unused event should have a very small overhead. Else, yes we run ex commands each time.
For the setting: it is just the cost of g_free/g_strdup. (don't known if better is possible)
Now, does rewrite content-security-policy feature without autocmd will permit a gain ? We still have need of request-queued
signal, and still need some pattern matching for choose which policy need to be applied. The gain could be at applying the policy: it should be possible to just change a pointer, and avoid free/dup, as we control the list of policies.
After, I don't known enought some internals of vimb (ex-command, setting for example) for see the gain of not using :set
but another special command.
For the plugin API, I am not sure about possible gains: it will be just some kind of hooks, like autocmd does.
I just pushed a patch that use the idea of conditionnally connect/disconnect request-queued
signal.
Now, if the user don't use neither "content-security-policy" nor RequestQueued autocmd event, the signal is not connected: it should result none overhead in vimb in this case.
For that, I add a new flag in setting.c: FLAG_SIGNAL. "content-security-policy" use it. When the value is changed, the function vb_update_signals()
is called. This function will look at autocmd and "content-security-policy" setting to decide if the signal should be connected, disconnected or keep as it.
In same manner, autocmd_add call now vb_update_signals()
on adding or removing.
vb_update_signals()
is called only when modification occurs (autocmd_add or setting_set_value). So the impact of checking signal should keep low.
Hm, setting is a hash table lookup, some data preparation and g_strdup/g_free. It's not too heavy, but it's called often in case there are autocmds. But the main point is that we change a setting multiple times per request that could be done more direct with an dedicated ex-cmd.
I don't think that we need c441ec7aaf4fabc1a32a618bbab3b688d7804628 to get rid of the potential performance issues. I don't think that the callback for the 'request-queued' is the problem itself. Intentional I'd not add too much additional code to avoid performance issues or better to not add code for issue that are not worth to be avoided.
If we keep the callback active, independent if it's used or not, makes the code easier to maintain (especially if there shoudl be added some new functionality to the callback). And the call to autocmd_run
with no autocmd to the event AU_REQUEST_QUEUED
has only the overhead of a function call and a bit comparison. Note that the vb_uppdate_signals
is also called multiple times per request because of the change settings in the autocmd.
In my opinion we should revert c441ec7aaf4fabc1a32a618bbab3b688d7804628 this seems to have no benefit to replace a function call by another function call.
It would be ideal if we could easily identify if a queued request is for a web page (the html, the requested page and all the frames and iframes). If this would be possible we could reduce the calls for autocmd_run
to mostly one (in case that there are no redirects and frames involved).
ok, I think I better saw what is the drawback you want to avoid: re-set multiple times a setting for one page, whereas it is always the same setting (one page = n request, and there is one setting_set_value for each request done). Please correct me if I am wrong.
in this case, a dedicated ex-command could do the trick, if in one call we set the content-security-policy for all pages. Else, we always need one call for each request. A possibility would be to have a dedicated file ? it would permit a better readability of the setting, instead of a one-line parameter...
I am not sure to understand your last comment. You want to differenciate html page and image/css page ? Could you explain to me the purpose of it ?
ok, I think I better saw what is the drawback you want to avoid: re-set multiple times a setting for one page, whereas it is always the same setting (one page = n request, and there is one setting_set_value for each request done). Please correct me if I am wrong.
That is one dimension of what I meant. My concerns went more to the nature of settings. Normally settings are read from config and applied to vimb and the user browses the whole night and day with them, with some exception done during runtime via inputbox :set ...
.
The autocmd causes that we understand settings more and more as temporary changes. Autocmd change settings on a per request base (page request).
The 'RequestQueued' event changes the way settings are understood more in this direction, because of settings are changed multiple times per user initiated request. This is a little bit strange from point of the user.
For me the concern popped up by the way how we understand settings, what they mean and how they are treated if we are running autocmd for RequestQueued. That's relativly independent from if the settings where set always to the same value or of the value is continously changed during the request flow.
I am not sure to understand your last comment. You want to differenciate html page and image/css page ? Could you explain to me the purpose of it?
From point of a user and also with focus on the Content-Security-Policy it's enough to change settings or the 'content-security-policy' setting only for the requested page and pages that are included as frames or that are opened by a redirect. The user normally don't want to change the browser behaviour if there is requested some special image (and in facto for the most settings this would be to late).
My concerns went more to the nature of settings.
The problem with the current meaning of "content-security-policy" is it designate the policy to be included for the next response-headers, independantly of the domain. So in order to adapte the policy per domain, it needs change the setting per request (one page in one domain could include frame in other domain).
Maybe we should change "content-security-policy" to be: the policies that would be applied for the designated requests. So having a list of pair like pat1 policy1,pat2 policy2,[...]
That would result config file like:
# default policy
set content-security-policy=* default-src 'none'
# policy for twitter
set content-security-policy+=https://*.twitter.com/* default-src 'self' 'unsafe-inline' https://*.twimg.com/
# etc...
What do you think about that ? It make the policy more immuable (but possibly changed), and we could remove the RequestQueued
autocmd event that have been introduced for pallied the "per request" need of change. For customize the policies, LoadProvisionnal should be suffisent.
We go back to some propositions you do before :). But I think we could keep setting for the policies definitions.
I also take a look about the more general auto-reponse-header
you proposed early.
That seems to be that what I thought about as extra ex-cmd. But to reuse the setting mechanism, seems to be simplier than a new command. And in fact it's more a setting than a command. The only drawbacks I see are:
:csp-add pat1 policy1
and to remove :csp-remove pat1
). I think the is the right way to go at the current state of vimb. I'm not sure if this is really useful, but I plan to implement the :source
command to load settings (ex-cmd) from a file given by path. So we don't need to add a new file for the content-security-policy. It's a settings, so this should be set in config file.
I've only one question. How is set content-security-policy=* default-src 'none'
handled if there are policies for other domains/patterns. Normally vimb would walk through the pattern-policy pairs and apply the first matching. The '*
' would always be a match an therefore applied always independent if there are special patterns that would also match.
Or do you think we should apply all policies for matching patterns as there is the possibility to add multiple Content-Security-Policy headers? Or should the '*
' pattern applied if no other match was found?
How is set content-security-policy=* default-src 'none' handled if there are policies for other domains/patterns.
In fact, it depends the rule we implement for pattern-matching. There is two possiblities:
If we choose "first match win", the default rule should be setted last, and if we choose "last match win", the default rule should be setted first. In my previous example, I was thinking to "last match win".
A possible inspiration is to look how firewall rules are applied: under OpenBSD, pf
(the firewall) use "last match win" and add a quick
flag for say "don't look rules after, and use this one".
I think "first match win" is more efficient when the user have a lot of rules. But "last match win" is more readable: the default rules are first, and particularities after. But it is just a point of vue... It is trivial to switch the method.
The "last match win" seems to be better in case the user adds new policies during runtime because we don't have to consider the default rule hopefully set as first rule, if a new rule is appended. And the latest added rules are the rules with the highest precedence. From point of the configuration I'd use the "last match win", to look for matches we should walk through the list from end to combine readability and efiiciency.
@semarie I'll be without computer the next 3 days, so please be not concerned if I don not answer to your interesting comits and thoughts.
Hi Daniel,
I did a rewrite of the code, using your proposition of auto-response-header
, used as setting. The main motivation for auto-reponse-header, instead of just content-security-policy is to set more headers: Content-security-Policy
(as discuted), Content-security-Policy-Report-Only
(maybe useful for adjusting the policy ?), Strict-Transport-Security
(for sites proposing https but without enforcing it).
So I added two files arh.{c,h}
(arh for "Auto Reponse Header") to do the job. I have reused some code from autocmd for parsing the setting (get_next_word) which have been adapted for this context. The pattern-matching is done with util_wildmatch
.
auto-response-header
is a list of pattern name=value
. The last pattern is keeped (exactly, the list is proceded in order, and replace
is used). You could match multiple differents headers (set CSP and HSTS for same pattern).
For example:
# default values first
set auto-response-header=* X-Test=yes
set auto-response-header+=* Content-Security-Policy=default-src 'self'
# for twitter
set auto-response-header+=https://twitter.com/* Content-Security-Policy=default-src 'self' https://*.twimg.com/
# etc ...
Nice work!
I'm not sure, but maybe we can use soup_header_parse_param_list or soup_header_parse_list
instead of read_name
and read_value
.
It would be nice to add procompiler feature flags, to allow users to compile without this feature. I think there are some users who don't want to manipulate response headers and save some bytes of the binary instead.
Wen should add the arh_free(vb.config.autoresponseheader);
to the vb_quit
to easier memory leak checking.
Then the feature seems to be ready for use and can be mergerd into master branch. Or do you see open issues?
soup_header_parse_*
functions, I am not sure it is possible: the function expects a list of parameters, whereas the auto-response-header compoment isn't a list but just one header with (optional) value. I will look it more later. I am not sure what it is better: one pattern with multiple headers (but we should be able to copte with auto-response-header which is a list also...) ; or as now: one pattern with one header.First issue: a header value couldn't contains , (comma) char. So definitly need to rewrite the parser.
Ideas are welcome for a new syntax that permit using soup_header_parse_*
functions...
some features have been implemented but finally not used. Should we keep them?
- autocmd RequestQueued event can be removed, this seems to be not useful for the autocmd
- autocmd_in_use function (should be deleted I think) - yes this can be removed
- making LoadProvisional uri dependant (the uri is the initial request) - he uri for the autocmd event can be useful in some cases - we should keep this
- in session_request_queued_cb: remove the "unexpected header abort()" - yes this can be removed
- in arh_parse: remove the "append" DEBUG_PRINTF - ok can be removed
It would be nice to allow to define multiple headers per pettern. But at the moment I've no good idea how this could be set without to use a dedicated ex-cmd (like :shortcut-add
) instead of the setting. One possibility could be to use soup_header_parse_semi_param_list
and to use ;
as header separator. But this isn't analog to :set header
where the headers are separated by ,
.
Daniel,
I rewrite arh_parse. It now support multiple headers per pattern. The setting's format is near the same as before.
The setting is a list.
Each element is pattern header-list
. If header-list is one element, the format is same; else QUOTE must enclose the element "pattern header-list"
.
The header-list format is same as header
setting.
So these are valids:
:set auto-response-header=* X-A=a
:set auto-response-header=* X-A=a,http://* X-B=b
:set auto-response-header="* X-A=a"
:set auto-response-header="* X-A=a,X-B=b","http://* X-A=a,X-B","* X-C=c"
or more lisibly:
:set auto-response-header=
:set auto-response-header+="* X-A=a"
:set auto-response-header+="* X-A,X-B=b"
@fanglingsu any problem with merging the auto-response-header
feature ?
The branch is merged into master now. Thanks for your work!
Add support to force vimb to apply the Content-Security-Policy http://www.w3.org/TR/CSP/. Allow to fake the server response header to apply the security policy also for those pages that do not provide the header at th time.
This issue continues a discussion done on the pull request #133.