dmptrluke / django-markdownfield

A simple custom field for Django that can safely render Markdown and store it in the database.
MIT License
43 stars 13 forks source link

Integrating internal links with relative file path in .md files #35

Open Spiewart opened 9 months ago

Spiewart commented 9 months ago

I picked up this package for my demo site and was really struggling to include internal links into markdown files, at least in my development environment. I want to be able to include relative URL file paths, i.e. /djangoapp/view/, into my .md files, rather than absolute file paths, i.e. http://localhost:3000/djangoapp/view.

My settings are divided into local.py and production.py. In my local.py, I have SITE_URL = "http://localhost:3000", but when I set a link in my .md files such as [link](/djangoapp/view/), it saved the HTML as an external link <a ref="/djangoapp/view/" target="_blank" class=" external" rel="nofollow noopener noreferrer">djangomodel</a>. If I put the absolute file path in the .md file, [link](http://localhost:3000/djangoapp/view/), the link was rendered as a local link. However, I want to use my .md files in development and production without modifying all the absolute file paths.

I first tried a solution whereby I include Django url tags ({% url "djangoapp:view" %} in my .md files and render the .md HTML equivalents as Template objects in my view, which I then served to the context. This seemed messy. Then I figured out the following one that seems pretty hacky and not that performant, but does work like I want it to.

  1. I copied a Stack Overflow answer to get all of my apps URLs.

    from django.urls import URLPattern, URLResolver
    def list_urls(lis, acc=None):
    #https://stackoverflow.com/questions/1275486/how-can-i-list-urlpatterns-endpoints-on-django#answer-54531546
    if acc is None:
        acc = []
    if not lis:
        return
    l = lis[0]
    if isinstance(l, URLPattern):
        yield acc + [str(l.pattern)]
    elif isinstance(l, URLResolver):
        yield from list_urls(l.url_patterns, acc + [str(l.pattern)])
    yield from list_urls(lis[1:], acc)
  2. I overwrote the django-markdownfield format_link function and modified it to check all links registered as external for membership in a list of internal links.

    def format_link(attrs: Dict[tuple, str], new: bool = False):
    """
    This is really weird and ugly, but that's how bleach linkify filters work.
    """
    try:
        p = urlparse(attrs[(None, 'href')])
    except KeyError:
        # no href, probably an anchor
        return attrs
    
    if not any([p.scheme, p.netloc, p.path]) and p.fragment:
        # the link isn't going anywhere, probably a fragment link
        return attrs
    
    if hasattr(settings, 'SITE_URL'):
        c = urlparse(settings.SITE_URL)
        link_is_external = p.netloc != c.netloc
    else:
        # Assume true for safety
        link_is_external = True
    
    if link_is_external:
        # create a list of all the app's URLs and check if the hyperlink path is in that list
        urlconf = __import__(settings.ROOT_URLCONF, {}, {}, [''])
        app_urls = ["/" + ''.join(url_part_list) for url_part_list in list_urls(urlconf.urlpatterns)]
        if p.path not in app_urls:
            # link is external - secure and mark
            attrs[(None, 'target')] = '_blank'
            attrs[(None, 'class')] = attrs.get((None, 'class'), '') + ' external'
            attrs[(None, 'rel')] = 'nofollow noopener noreferrer'
    
    return attrs
  3. I overwrote the django-markdownfield MarkdownField to substitute the new format_link function.

    class OverwrittenMarkdownField(MarkdownField):
    def pre_save(self, model_instance, add):
        value = super().pre_save(model_instance, add)
    
        if not self.rendered_field:
            return value
    
        dirty = markdown(
            text=value,
            extensions=EXTENSIONS,
            extension_configs=EXTENSION_CONFIGS
        )
    
        if self.validator.sanitize:
            if self.validator.linkify:
                cleaner = bleach.Cleaner(tags=self.validator.allowed_tags,
                                         attributes=self.validator.allowed_attrs,
                                         css_sanitizer=self.validator.css_sanitizer,
                                         filters=[partial(LinkifyFilter,
                                                          callbacks=[format_link, blacklist_link])])
            else:
                cleaner = bleach.Cleaner(tags=self.validator.allowed_tags,
                                         attributes=self.validator.allowed_attrs,
                                         css_sanitizer=self.validator.css_sanitizer)
    
            clean = cleaner.clean(dirty)
            setattr(model_instance, self.rendered_field, clean)
        else:
            # danger!
            setattr(model_instance, self.rendered_field, dirty)
    
        return value
  4. Use the OverwrittenMarkdownField in my model field definitions, as opposed to the MarkdownField natively provided by the package.

This results in the desired behavior whereby relative internal links are saved and rendered with the HTML for an internal link (no target="_blank").

Appreciate any feedback.

Spiewart commented 6 months ago

Brief update for anyone who finds their way here. I rewrote the format_link function to utilize Django's URL resolver. Now it checks if a link can be resolved with the internal URL structure, if a Resolver404 error is raised, it checks if a trailing slash is missing and raises a ValueError, otherwise the Resolver404 error is passed because the link is external. This allows for URLs that require parameters to be put into the .md files and still be marked as internal if they resolve.

from django.urls import Resolver404, resolve  # type: ignore

def format_link(attrs: dict[tuple, str], new: bool = False):
    """
    This is really weird and ugly, but that's how bleach linkify filters work.
    """
    try:
        p = urlparse(attrs[(None, "href")])
    except KeyError:
        # no href, probably an anchor
        return attrs

    if not any([p.scheme, p.netloc, p.path]) and p.fragment:
        # the link isn't going anywhere, probably a fragment link
        return attrs

    if hasattr(settings, "SITE_URL"):
        c = urlparse(settings.SITE_URL)
        link_is_external = p.netloc != c.netloc
    else:
        # Assume true for safety
        link_is_external = True

    if link_is_external:
        # I have overwritten this to allow for internal links to be written into markdown
        # agnostic to my development and production environment. This is a hacky solution
        # but it works for now. Internal urls must follow the pattern app_name/url_stuff/
        # Try to resolve the link
        try:
            resolve(p.path)
        # If it fails, try adding a trailing slash
        except Resolver404:
            slash_path = p.path + "/"
            # If adding a slash resolves it as an internal link, raise a ValueError
            # to alert the user that they need to add a trailing slash to their link
            try:
                resolve(slash_path)
                raise ValueError(f"Link {p.path} is missing a trailing slash")
            # If adding a slash doesn't resolve it, it's an external link
            except Resolver404:
                pass
            # link is external - secure and mark
            attrs[(None, "target")] = "_blank"
            attrs[(None, "class")] = attrs.get((None, "class"), "") + " external"
            attrs[(None, "rel")] = "nofollow noopener noreferrer"

    return attrs