Support Markup instances as Safe ones

playpauseandstop commented 5 months ago

At a moment there is a way for ludic to avoid escaping any content in a string - by wrapping it into ludic.base:Safe as follows,

from ludic import html as lhtml
from ludic.base import Safe

class SomePage(Component[NoChildren, NoAttrs]):
    @override
    def render(self) -> lhtml.body:
        content = ...  # Generate page content 
        return lhtml.body(
           *content,
           lhtml.script(Safe(...), type="text/javascript"),
        )

and it works just fine.

But there is one very popular Python library, MarkupSafe, which provides Markup type and I wonder, whether it possible for ludic to treat those instances as Safe ones and do not attempt to escape them?

I'm asking, cause when I want to render Markdown content wrapped into Markup strings as,

        return PageLayout(
            markdown.markdown(extra["prologue"]),
            *links,
            markdown.markdown(extra["epilogue"]),
            extra_js=(
                lhtml.script(
                    Markup('new AnchorJS().add("h2");'), type="text/javascript"
                ),
            ),
        )

the output, for some reason, looks -->

Screenshot 2024-06-16 at 18 04 32

But when I wrap markdown.markdown(...) calls, which returns Markup instances, into Safe (and change Markup in lhtml.script to Safe),

        return PageLayout(
            Safe(markdown.markdown(extra["prologue"])),
            *links,
            Safe(markdown.markdown(extra["epilogue"])),
            extra_js=(
                lhtml.script(
                    Safe('new AnchorJS().add("h2");'), type="text/javascript"
                ),
            ),
        )

all looks good -->

Screenshot 2024-06-16 at 18 06 14

And I wonder, if it possible for ludic to treat Markup instances (automagically or via specific config) as Safe ones?

paveldedik commented 5 months ago

Hi Igor, thank you for the issue.

I am worried about security issues, it seems that python-markdown had the safe_mode attribute which has been deprecated and now users are advised to use an HTML purifier. See https://stackoverflow.com/a/10474537

I was thinking that maybe it could work like this:

creating a special Markdown component (could be part of ludic.catalog.markdown) which would basically be implemented with the Safe keyword.
it still is necessary to have a safe or trusted attribute which basically says "the markdown contains custom HTML which cannot be trusted"

Here is a possible implementation:

from typing import override

from markdown import markdown
from ludic.attrs import GlobalAttrs
from ludic.base import ComponentStrict, JavaScript, Safe
from ludic.html import div, srcipt

class MarkdownAttrs(GlobalAttrs, total=False):
    trusted: bool

class Markdown(ComponentStrict[str, GlobalAttrs]):
    """Renders given string using Python-Markdown library."""

    @override
    def render(self) -> div:
        content = self.children[0]
        if not self.attrs.get("trusted", False):
            content = # purify content using a library
        return div(Safe(markdown(content)), **self.attrs)

The usage:

# Not using HTML, so no need to explicitly pass `trusted=True`
Markdown("# Header")

# I trust the input
Markdown("# Header <script>...</script>", trusted=True)

# I don't trust the input
Markdown(user_input)

Would that make sense?

paveldedik commented 5 months ago

Sorry Igor, I just noticed the suggestion of the MarkupSafe library. Let me think, that probably allows a bit different approach.

paveldedik commented 5 months ago

I need to properly analyse this use case. The reason is that Ludic is already escaping user input, so for example, this is safe to do in Ludic:

from ludic.html import em

p(f"Hello {em(name)}")

It correctly renders while escaping name. The markupsafe library is basically allowing similar thing.

Now I think it should not be needed to use markupsefe at all and still be safe. But that requires making the following work:

from ludic.html import em
from markdown import markdown

markdown(f"Hello {em(name)}")

Which would not work, of course, because python-markdown doesn't know anything about ludic's elements.

I need to analyse what would be the bast way to allow this - whether it is possible at all. Maybe it will involve usage of markupsafe in the end (by creating a Ludic component, something like Markdown that would incorporate all these libraries)

playpauseandstop commented 5 months ago

Sorry for not providing full context initially, but my markdown.markdown(...) call from the description has nothing common with mentioned python-markdown library. So let me try once again.

My main request for ludic is: support extra Safe strings, which can have other sources and not only be wrapper as ludic.base:Safe.

Safe string (in my case) passed as children to ludic component as markupsafe:Markup instance, but, if I'll use ludic with Django, it is natural that Django's safe string will be passed as django.utils.safestring:SafeString instances.

And I've run into the issue, which mentioned in the description, when pass following Markup instances to ludic,

@attrs.frozen(kw_only=True)
class Markdown:
    extras: tuple[str, ...] = DEFAULT_MARKDOWN_EXTRAS

    def markdown(self, text: str, /) -> Markup:
        return Markup(markdown2.markdown(text, extras=self.extras).strip())

markdown = Markdown()

class SomePage(Component[NoChildren, SomePageAttrs]):
    @override
    def render(self) -> lhtml.div:
        return lhtml.div(markdown.markdown(...), ...)

So, once again, the focus, from my POV, shouldn't be on python-markdown / markdown2 / any other library, which generates HTML from the text, but rather on:

And by ensuring that ludic somehow can be configured to understand that if such instance passed as children to component it shouldn't be rendered anyhow, by treating it as safe string.

Hope, that is more clear this time 🙏

ps. And yes, at a moment, I've fixed the original issue and my Markdown class knows which wrapper to use after the markdown2.markdown(...) call, so all good at my side and the issue can be moved to discussion TBH

paveldedik commented 4 months ago

Thank you for the clarification. I guess my main problem is I don't know how it should be implemented and I have to think about that, I see the following options:

a global configuration variable of "safe" classes - I don't like that because users need to explicitly register the class there. But perhaps it is still a better option than the next two options.
a contrib module with inherited e.g. "SafeString" class for Django which is recognized by ludic. We'd probably end up with a lot of contrib packages which would only contain one class.
hardcoded list of "safe" classes from various libraries - even though it is possible with try: ... except ImportError it doesn't seem especially nice and creates a bit of coupling, we could end up with a huge list of "safe" classes in Ludic. For the users of Ludic, this is probably the best option as they don't really need to do anything and it will work for them.

Do you see any other options? Which one sound the best for you? I guess I'm considering the first and the third option. Do you know about any libraries which use this approach?

getludic / ludic

Support Markup instances as Safe ones #74