coleifer / micawber

a small library for extracting rich content from urls
http://micawber.readthedocs.org/
MIT License
635 stars 91 forks source link

CSP headers #39

Closed Garito closed 9 years ago

Garito commented 9 years ago

Hi! I'm using Flask but this will be usefull for Django and others Will be supernice to have a feature that accumulates in a per request cache or something which services has been used and correct the content security policy header to include this services as accepted origins

Otherwise the embedded object will not load blocked by the browser and it is not acceptable to allow any origin but only those needed

Thanks a lot!

coleifer commented 9 years ago

I'm not sure I understand, can you explain a bit more?

Garito commented 9 years ago

Sure The point is that browsers are implementing a header where you can configure which sources the browser will be allowed to load content. Here some help: http://www.html5rocks.com/en/tutorials/security/content-security-policy/

With this header loaded, any source will be loaded if it is not specified there

It is a good security practice to only allow from sources you know its safe to load content so will be nice to allow only from self and the used oembed providers

Will be nice to have the list of used domains. So for instance if micawber is parsing from youtube vimeo flickr will be necessary to have a function to ask for the domains and returns a list with https://www.youtube.com player.vimeo.com https://*.staticflickr.com So in flask now I'm doing:

@app.after_request
def after_request(response):
  response.headers.add('Content-Security-Policy', "default-src 'self' https://www.youtube.com player.vimeo.com https://*.staticflickr.com")

And I would like to do:

@app.after_request
def after_request(response):
  origins = micawber.domains() + ["'self'"]
  response.headers.add('Content-Security-Policy', "default-src {}".format(' '.join(origins)))
coleifer commented 9 years ago

Oh, I see, thanks for clarifying. That certainly is an interesting request, but I'm afraid it would add an extra layer of complexity to registering providers (the need to register a domain from which the media would be loaded). Do you have any thoughts on the implementation?

Garito commented 9 years ago

Sure I was thinking that at parse level you could append the parsed domains in a list without duplicates and, if the cache is used, store this list there. Then add the domains method and return the list if there was one or empty list if the is no cache Or, if you prefer, add an extra param to the parsers and if this argument is True return a tuple with the regular response plus the domains list

The point is that you are already parsing this info with regular expressions. The discussion will be more how will be the interface to use it

Garito commented 9 years ago

Something will convert this issue in something easier is this: Right now I'm using this domain for flickr in the header: https://*.staticflickr.com because I can't read the actual response domain but if I could I will prefer to strenght it to the concrete one https://.staticflickr.com

coleifer commented 9 years ago

Adding this type of state to the cache or provider registry could introduce problems in multi-threaded environments. On a web-app, you would typically have a single ProviderRegistry handling multiple requests. If a single attribute is being modified by multiple threads this could cause strange behavior.

Garito commented 9 years ago

So no python version. I'll been ok with that But Flask has all I need to do the job thread safely, isn't it? (I imagine will be the same for Django) Thinking deeper even there are no context excuses for the python version but make a lot of sense for the flask/django ones, don't you think so?

coleifer commented 9 years ago

I think I am going to pass on implementing this, but would be interested in reviewing a pull-request if you have time.

Garito commented 9 years ago

Sorry but this issue is not an optional one. Without it, your library is useless in CSP enabled projects because CSP will block any content not authorized on its header

So time to search for another solution

Thanks for your time