fanglingsu / vimb

Vimb - the vim like browser is a webkit based web browser that behaves like the vimperator plugin for the firefox and usage paradigms from the great editor vim. The goal of vimb is to build a completely keyboard-driven, efficient and pleasurable browsing-experience.
https://fanglingsu.github.io/vimb/
GNU General Public License v3.0
1.35k stars 100 forks source link

Content-Security-Policy #135

Closed fanglingsu closed 10 years ago

fanglingsu commented 10 years ago

Add support to force vimb to apply the Content-Security-Policy http://www.w3.org/TR/CSP/. Allow to fake the server response header to apply the security policy also for those pages that do not provide the header at th time.

This issue continues a discussion done on the pull request #133.

fanglingsu commented 10 years ago

@semarie If no content-security-policy is set you remove the Content-Security-Policy header (8f3379a). In my opinion this should be kept, if there is a webpage that provides this nice header, than we should use it. I see the fake of the header mainly to forcee a security policy if this is not provided by the server, or to extend a provided policy (by adding additional headers).

semarie commented 10 years ago

No it isn't exactly that. These headers are removed before the server has added headers. So any header that come from server are keeped.

If I remember well, I do that because SoupSession seems reuse SoupMessage in some cases. So whereas I set :set content-security-policy= the header remains. It was more simple to remove or replace in all cases.

fanglingsu commented 10 years ago

No it isn't exactly that. These headers are removed before the server has added headers. So any header that come from server are keeped.

OK, I didn't knwo that.

If I remember well, I do that because SoupSession seems reuse SoupMessage in some cases.

This sounds strange for me, but maybe this is a w3 specified behaviour for some kind of responses like Range-Responses (you can try it with downloads on page http://radiopanel.narod.ru/collection/vef214.htm) where the response is split into multiple pieces. I can image that it would be useful or required to reuse parts of the first responses headers for the following responses to the same request.

semarie commented 10 years ago

This sounds strange for me

I redo some tests, but I could get back a problem... so currently I test with an abort() if any header are setted.

Maybe it was a problem in a configuration file when I test it before... So for the moment, I assume soup_message_headers_append is enought (and if not, the abort() will do his job :) ).

semarie commented 10 years ago

Just some thoughts...

One possibility is to try to optimize autocmd by using compiled pattern for example (but full performance is more complex than just call g_pattern_spec_new and g_pattern_match_string, we should call g_pattern_match and stock uri, reversed-uri and uri-length to not recalculate it at each autocmd place).

That was my first thought to use glib's pattern matching, but I found this to be too much regex and decided to implement a more shell like matching by myself. I'm not sure if the glib matching would be provider better performance in case of the autocmd.

Oh, I haven't realized that the matching isn't done with glib.

I have done some timing to see the difference between util_wildmatch, g_pattern_match_string and g_pattern_match. For the moment, I haven't see any difference between them (diff is in range of 500ns).

I think that the headers may not disturb anything. But I thought that we need an autocmd event for request-queued to get rid of redirects. In this case the autocmd would be fired for each resource on the page and not only for the page itself. [...] But this does not work for redirect, or do I misunderstand this? And this will also not work for frames in the page. For example if trusted.org contains a frame to something.org and I've rules defined for something.org, this should also be applied for the content shown in the frame. The LoadProvisional is only called for the main page, but not for the frames

You are right: LoadProvisional have the drawback to mask frames or redirect. So having autocmd event for request-queued is what to do.

There is some code in my branch, if you want take a look.

fanglingsu commented 10 years ago

I'm not luck with the automcd for request-queued. This autocmd are called multiple times for each resource on a page. That means that we check if there is a matching automcd for the request. If there are one or more autocmd for a domain will this lead to multiple calls of :set ... with always the same value. I think this is a show stopper for vimb. On the other side may users define a strict content-security-policy global without using autocmd to make exceptions.

I see three ways to go further.

  1. Keep it like it is. This allows to define CSP global and/or with fine grained exceptions done via autocmd. the later with the drawback of less performant processing.
  2. Remove the autocmd event for request-queued. This makes only sense for the CSP at the time. This has the already mentioned drawbacks, but should be fast enough to not disturb the user.
  3. Don't use autocmd nor setting.
    • Use a dedicated ex-commad for that. Something like that :content-security-policy http{s,}://*.github.com/* 'self' 'unsafe-inline' 'unsafe-eval' data: https://*.github.com/ https://*.githubusercontent.com/. This avoid to call :set ... multiple times for a setting that is used only in place where the autocmd is called.
    • It's also thinkable to use a ex-commad to apply auto headers like :auto-response-header http{s,}://*.github.com/*,https://*.google.com/* Content-Security-Policy=...
    • Put in somethings like a plugin API, a call to a user defined script, LUA, or something similar. This would allow to keep the complexity or the performance issues out of vimb. But maybe this is a little bit to special to built hook for a script. And I'm not a fan of plugins, because they tend to make the whole application slow and cause some overhead.
semarie commented 10 years ago

I think we need to quantified the overhead to add content-security-policy, in order to choose the best path.

Currently, the patch do:

For the signal: as request-queued is used only for autocmd, it should be possible to connect it only if RequestQueued autocmd is used. So if someone don't use it, the signal is fired: no overhead. Else, using the autocmd event have a cost...

For autocmd: there is an early cut-off with usedbits. So call autocmd_run with an unused event should have a very small overhead. Else, yes we run ex commands each time.

For the setting: it is just the cost of g_free/g_strdup. (don't known if better is possible)


Now, does rewrite content-security-policy feature without autocmd will permit a gain ? We still have need of request-queued signal, and still need some pattern matching for choose which policy need to be applied. The gain could be at applying the policy: it should be possible to just change a pointer, and avoid free/dup, as we control the list of policies.

After, I don't known enought some internals of vimb (ex-command, setting for example) for see the gain of not using :set but another special command.

For the plugin API, I am not sure about possible gains: it will be just some kind of hooks, like autocmd does.

semarie commented 10 years ago

I just pushed a patch that use the idea of conditionnally connect/disconnect request-queued signal.

Now, if the user don't use neither "content-security-policy" nor RequestQueued autocmd event, the signal is not connected: it should result none overhead in vimb in this case.

For that, I add a new flag in setting.c: FLAG_SIGNAL. "content-security-policy" use it. When the value is changed, the function vb_update_signals() is called. This function will look at autocmd and "content-security-policy" setting to decide if the signal should be connected, disconnected or keep as it.

In same manner, autocmd_add call now vb_update_signals() on adding or removing.

vb_update_signals() is called only when modification occurs (autocmd_add or setting_set_value). So the impact of checking signal should keep low.

fanglingsu commented 10 years ago

Hm, setting is a hash table lookup, some data preparation and g_strdup/g_free. It's not too heavy, but it's called often in case there are autocmds. But the main point is that we change a setting multiple times per request that could be done more direct with an dedicated ex-cmd.

I don't think that we need c441ec7aaf4fabc1a32a618bbab3b688d7804628 to get rid of the potential performance issues. I don't think that the callback for the 'request-queued' is the problem itself. Intentional I'd not add too much additional code to avoid performance issues or better to not add code for issue that are not worth to be avoided. If we keep the callback active, independent if it's used or not, makes the code easier to maintain (especially if there shoudl be added some new functionality to the callback). And the call to autocmd_run with no autocmd to the event AU_REQUEST_QUEUED has only the overhead of a function call and a bit comparison. Note that the vb_uppdate_signals is also called multiple times per request because of the change settings in the autocmd. In my opinion we should revert c441ec7aaf4fabc1a32a618bbab3b688d7804628 this seems to have no benefit to replace a function call by another function call.

fanglingsu commented 10 years ago

It would be ideal if we could easily identify if a queued request is for a web page (the html, the requested page and all the frames and iframes). If this would be possible we could reduce the calls for autocmd_run to mostly one (in case that there are no redirects and frames involved).

semarie commented 10 years ago

ok, I think I better saw what is the drawback you want to avoid: re-set multiple times a setting for one page, whereas it is always the same setting (one page = n request, and there is one setting_set_value for each request done). Please correct me if I am wrong.

in this case, a dedicated ex-command could do the trick, if in one call we set the content-security-policy for all pages. Else, we always need one call for each request. A possibility would be to have a dedicated file ? it would permit a better readability of the setting, instead of a one-line parameter...

I am not sure to understand your last comment. You want to differenciate html page and image/css page ? Could you explain to me the purpose of it ?

fanglingsu commented 10 years ago

ok, I think I better saw what is the drawback you want to avoid: re-set multiple times a setting for one page, whereas it is always the same setting (one page = n request, and there is one setting_set_value for each request done). Please correct me if I am wrong.

That is one dimension of what I meant. My concerns went more to the nature of settings. Normally settings are read from config and applied to vimb and the user browses the whole night and day with them, with some exception done during runtime via inputbox :set .... The autocmd causes that we understand settings more and more as temporary changes. Autocmd change settings on a per request base (page request). The 'RequestQueued' event changes the way settings are understood more in this direction, because of settings are changed multiple times per user initiated request. This is a little bit strange from point of the user. For me the concern popped up by the way how we understand settings, what they mean and how they are treated if we are running autocmd for RequestQueued. That's relativly independent from if the settings where set always to the same value or of the value is continously changed during the request flow.

I am not sure to understand your last comment. You want to differenciate html page and image/css page ? Could you explain to me the purpose of it?

From point of a user and also with focus on the Content-Security-Policy it's enough to change settings or the 'content-security-policy' setting only for the requested page and pages that are included as frames or that are opened by a redirect. The user normally don't want to change the browser behaviour if there is requested some special image (and in facto for the most settings this would be to late).

semarie commented 10 years ago

My concerns went more to the nature of settings.

The problem with the current meaning of "content-security-policy" is it designate the policy to be included for the next response-headers, independantly of the domain. So in order to adapte the policy per domain, it needs change the setting per request (one page in one domain could include frame in other domain).

Maybe we should change "content-security-policy" to be: the policies that would be applied for the designated requests. So having a list of pair like pat1 policy1,pat2 policy2,[...]

That would result config file like:

# default policy
set content-security-policy=* default-src 'none'

# policy for twitter
set content-security-policy+=https://*.twitter.com/* default-src 'self' 'unsafe-inline' https://*.twimg.com/

# etc...

What do you think about that ? It make the policy more immuable (but possibly changed), and we could remove the RequestQueued autocmd event that have been introduced for pallied the "per request" need of change. For customize the policies, LoadProvisionnal should be suffisent.

We go back to some propositions you do before :). But I think we could keep setting for the policies definitions.

semarie commented 10 years ago

I also take a look about the more general auto-reponse-header you proposed early.

fanglingsu commented 10 years ago

That seems to be that what I thought about as extra ex-cmd. But to reuse the setting mechanism, seems to be simplier than a new command. And in fact it's more a setting than a command. The only drawbacks I see are:

I think the is the right way to go at the current state of vimb. I'm not sure if this is really useful, but I plan to implement the :source command to load settings (ex-cmd) from a file given by path. So we don't need to add a new file for the content-security-policy. It's a settings, so this should be set in config file.

I've only one question. How is set content-security-policy=* default-src 'none' handled if there are policies for other domains/patterns. Normally vimb would walk through the pattern-policy pairs and apply the first matching. The '*' would always be a match an therefore applied always independent if there are special patterns that would also match. Or do you think we should apply all policies for matching patterns as there is the possibility to add multiple Content-Security-Policy headers? Or should the '*' pattern applied if no other match was found?

semarie commented 10 years ago

How is set content-security-policy=* default-src 'none' handled if there are policies for other domains/patterns.

In fact, it depends the rule we implement for pattern-matching. There is two possiblities:

If we choose "first match win", the default rule should be setted last, and if we choose "last match win", the default rule should be setted first. In my previous example, I was thinking to "last match win".

A possible inspiration is to look how firewall rules are applied: under OpenBSD, pf (the firewall) use "last match win" and add a quick flag for say "don't look rules after, and use this one".

semarie commented 10 years ago

I think "first match win" is more efficient when the user have a lot of rules. But "last match win" is more readable: the default rules are first, and particularities after. But it is just a point of vue... It is trivial to switch the method.

fanglingsu commented 10 years ago

The "last match win" seems to be better in case the user adds new policies during runtime because we don't have to consider the default rule hopefully set as first rule, if a new rule is appended. And the latest added rules are the rules with the highest precedence. From point of the configuration I'd use the "last match win", to look for matches we should walk through the list from end to combine readability and efiiciency.

fanglingsu commented 10 years ago

@semarie I'll be without computer the next 3 days, so please be not concerned if I don not answer to your interesting comits and thoughts.

semarie commented 10 years ago

Hi Daniel,

I did a rewrite of the code, using your proposition of auto-response-header, used as setting. The main motivation for auto-reponse-header, instead of just content-security-policy is to set more headers: Content-security-Policy (as discuted), Content-security-Policy-Report-Only (maybe useful for adjusting the policy ?), Strict-Transport-Security (for sites proposing https but without enforcing it).

So I added two files arh.{c,h} (arh for "Auto Reponse Header") to do the job. I have reused some code from autocmd for parsing the setting (get_next_word) which have been adapted for this context. The pattern-matching is done with util_wildmatch.

auto-response-header is a list of pattern name=value. The last pattern is keeped (exactly, the list is proceded in order, and replace is used). You could match multiple differents headers (set CSP and HSTS for same pattern).

For example:

# default values first
set auto-response-header=* X-Test=yes
set auto-response-header+=* Content-Security-Policy=default-src 'self'

# for twitter
set auto-response-header+=https://twitter.com/* Content-Security-Policy=default-src 'self' https://*.twimg.com/

# etc ...
fanglingsu commented 10 years ago

Nice work!

fanglingsu commented 10 years ago

I'm not sure, but maybe we can use soup_header_parse_param_list or soup_header_parse_list instead of read_name and read_value. It would be nice to add procompiler feature flags, to allow users to compile without this feature. I think there are some users who don't want to manipulate response headers and save some bytes of the binary instead. Wen should add the arh_free(vb.config.autoresponseheader); to the vb_quit to easier memory leak checking. Then the feature seems to be ready for use and can be mergerd into master branch. Or do you see open issues?

semarie commented 10 years ago
semarie commented 10 years ago

First issue: a header value couldn't contains , (comma) char. So definitly need to rewrite the parser.

Ideas are welcome for a new syntax that permit using soup_header_parse_* functions...

fanglingsu commented 10 years ago

some features have been implemented but finally not used. Should we keep them?

  • autocmd RequestQueued event can be removed, this seems to be not useful for the autocmd
  • autocmd_in_use function (should be deleted I think) - yes this can be removed
  • making LoadProvisional uri dependant (the uri is the initial request) - he uri for the autocmd event can be useful in some cases - we should keep this
  • in session_request_queued_cb: remove the "unexpected header abort()" - yes this can be removed
  • in arh_parse: remove the "append" DEBUG_PRINTF - ok can be removed

It would be nice to allow to define multiple headers per pettern. But at the moment I've no good idea how this could be set without to use a dedicated ex-cmd (like :shortcut-add) instead of the setting. One possibility could be to use soup_header_parse_semi_param_list and to use ; as header separator. But this isn't analog to :set header where the headers are separated by ,.

semarie commented 10 years ago

Daniel,

I rewrite arh_parse. It now support multiple headers per pattern. The setting's format is near the same as before.

The setting is a list. Each element is pattern header-list. If header-list is one element, the format is same; else QUOTE must enclose the element "pattern header-list". The header-list format is same as header setting.

So these are valids:

:set auto-response-header=* X-A=a
:set auto-response-header=* X-A=a,http://* X-B=b
:set auto-response-header="* X-A=a"
:set auto-response-header="* X-A=a,X-B=b","http://* X-A=a,X-B","* X-C=c"

or more lisibly:

:set auto-response-header=
:set auto-response-header+="* X-A=a"
:set auto-response-header+="* X-A,X-B=b"
semarie commented 10 years ago

@fanglingsu any problem with merging the auto-response-header feature ?

fanglingsu commented 10 years ago

The branch is merged into master now. Thanks for your work!