biati-digital / glightbox

Pure Javascript lightbox with mobile support. It can handle images, videos with autoplay, inline content and iframes
MIT License
2.01k stars 226 forks source link

Potential XSS via title and description #455

Closed LukeLambert closed 9 months ago

LukeLambert commented 9 months ago

Describe the bug First, thanks for this library! I just found it, and it's fantastic. However, I think the behavior of inserting the title and description into the DOM unescaped should be behind an option, or it should come with a big warning. It's certainly useful in some cases to add markup to those attributes, but it's also a major footgun for user generated content, as developers must know to double-escape any UGC. In Django, for instance, that looks like {{ photo.description|force_escape|force_escape }}. Not pretty.

Potential solutions:

Post the code you are using

<a href="full.jpg" class="glightbox"
   data-description="&lt;a href=&quot;javascript:alert(&#x27;Hello, world!&#x27;);&quot;&gt;Click me&lt;/a&gt;">
    <img src="thumb.jpg">
</a>
biati-digital commented 9 months ago

Hi @LukeLambert I'm glad the library it's useful.

In your specific example, I think the developer must know that if the content it's generated by a user, the dev needs to make sure to sanitize every value even before saving it to a database.

I would argue that if by this point a code like that it's passed to GLightbox there's something wrong, I've used the library in different projects where the user generates the content and depending on the project we sanitize user data in multiple ways, it does not matter if the user adds a code like your example, it will never be printed in the code.

Having said that, I think that future versions will not unescape attributes at all, you can use the advanced description if you want to use HTML so I think escaped code it's no longer necessary

LukeLambert commented 9 months ago

Sanitization is only necessary if you're allowing your users to author HTML, which I'm not. Since UGC may be displayed across different media with different attack vectors, trying to reliably sanitize it for every current and future attack is hopeless. Instead, UGC should be properly escaped. Most frameworks take care of this for you.

With the current Glightbox implementation, even a description containing the < character breaks the HTML. Since I don't want to strip that character from all UGC, double encoding is the only option.

<a href="full.jpg" class="glightbox" data-description="a&lt;b">
    <img src="thumb.jpg">
</a>

I wasn't aware of the advanced description until you mentioned it just now. It definitely seems like that would be a better (perhaps only) option for allowing developers to use HTML. In that case, a simple switch from assigning innerHTML to assigning textContent for the data- attributes would solve the issue.

biati-digital commented 9 months ago

The post you linked mentions input validation is a good thing. yes, this is very important and that's what the dev should be doing, verify that the href value is actually a URL, verify that the descriptions does not contains HTML, etc. This is exactly my point, if you know what kind of output should be expected (in your case you do not allow HTML) validate it and remove the HTML if any.

The library is really simple, will not handle all cases and must of the time it's the job of the developer to find a solution and know what they're doing, the library simply offers multiple ways to accomplish something. Probably the API could also be used to avoid the problem you mention.

But like I mentioned, the library will not unescape data attributes at all in a future version, the advanced description will be available for more specific scenarios.

Sorry if I'm not clear, English is not my first language.

I'll close this now as I'm checking the new version code and unescape it's already removed.

LukeLambert commented 9 months ago

The less than symbol < by itself is not HTML, but it should definitely be escaped before being inserted into the DOM. If I'm understanding you correctly, and you plan to escape titles and descriptions by default in the future, I think that will be better. I think that's what developers would expect and will cover 99% of use cases. They don't expect that a user entering a<b as a photo description will break their slide.

The change is as simple as: slideTitle.textContent = slideConfig.title; instead of: slideTitle.innerHTML = slideConfig.title;

I also just noticed that if a user starts their photo description with a period, the library will look for an element on the page with that class. e.g. data-description=".Hello world". Should developers know to remove leading periods from user-supplied descriptions?

I apologize if I come across as confrontational, but I think these are security concerns that should be taken seriously. It's a very useful library, but in the current implementation, it's too easy for developers to make mistakes with very real security implications.

biati-digital commented 9 months ago

I apologize if I come across as confrontational, but I think these are security concerns that should be taken seriously. It's a very useful library, but in the current implementation, it's too easy for developers to make mistakes with very real security implications.

That's ok, i take security seriously, but i've mentioned this many times, this library is for devs that know what they're doing, in my opinion the dev is responsible for what they provide to the lightbox.

About data-description=".Hello world" i totally forgot about this, it was probably implemented before using the advanced description, in 7 years i can't remember an issue about this so i think it's not mentioned in the docs. It would need to be a really really specific case where the user enters a description starting with a dot. This will probably also be removed in the future.

I also apologize if i come across as defensive but in 7 years i've heard many times things like I think that's what developers would expect or it should work this way, there's should be a new option to do this/that when most of the time it was something the dev should be capable of doing using the API. The upcoming version it's even more basic at it's core as i've removed many things that are no longer used. I have really limited time to work on this library so that's why my intention has always been to provide a simple base and devs must be responsible for everything else. Future version will be more modular and customizable so devs can have a lot more control about what's loaded, how is the data parsed, etc.