improve md link conversion for a better html navigation experience

joaoandreporto commented 1 year ago

This is a PR with the changes discussed here and with a suggestion on a README update.

joaoandreporto commented 1 year ago

Made some changes in order to automatically remove .md extensions from hrefs, also precompiled the regexes and structured them as class attributes.

joaoandreporto commented 1 year ago

Changes on this PR (which enhances #23 and effectively nulls out #24), are all pretty much done and just waiting for your review now.

jreichert commented 1 year ago

Thanks for this! Coincidentally I ran into the same issue this week and was writing my own patch to achieve the same outcome. I like your solution but you could make it a bit more DRY as follows:

class LinkInlineProcessor(markdown.inlinepatterns.LinkInlineProcessor):

    # regex patterns for .md extensions and anchors in href
    md_ext_pattern = compile_(r'(.+)(\.md)($|#.+)')

    # this encapsulates both cases; it matches both "foo#bar" and "#bar"
    anchor_pattern=compile_(r'(.*)#(.+)')

    # ...

    def getLink(self, *args, **kwargs):
        href, title, index, handled = super().getLink(*args, **kwargs)

        # check for and remove .md extension
        href = LinkInlineProcessor.remove_md_ext(href)

        # regex match for anchor hrefs
         anchor_match=LinkInlineProcessor.anchor_pattern.search(href)

        if not href.startswith("http") and not href.endswith(".html"):
            # index md to html link
            if auto_index and href.endswith("/"):
                href += "index.html"
            # anchor md to html link (on both current and different pages)
            elif anchor_match:
                base_page = ""
                hlnk, anchor_wiki_text = anchor_match.group(1, 2)

                if hlnk:
                    base_page = hlnk + .html
                anchor = markdown.extensions.toc.slugify(anchor_wiki_text, "-")
                href = base_page + "#" + anchor
            # no anchor md to html link
            elif not href.endswith("/"):
                href += ".html"
        return href, title, index, handled

That way you don't have multiple places in the code where you call slugify and construct the href, and it's a smidge more efficient because it only needs to run the regex once. In any case, really hoping this gets merged soon (and please push the updated version to PyPi as well - the one there still has a broken TOC).

joaoandreporto commented 1 year ago

Alright, I’ve made some small corrections to the snippet you’ve provided, tested and implemented your changes @jreichert , but haven’t pushed them yet.

Since I think you should obviously be listed in the contributors list, if @WnP ’s decision on this PR ends up being favourable, please open a PR, with your changes, on this anch_link_dev branch, for me to be able to merge them.

Also, if you haven’t done so already, or for anyone who wants / needs to use this, before a permanent fix is pushed, I’m assuming there’s no problem if you copy the vimwiki_markdown.py file in this PR, or your own solution, and set a path for it in the g:vimwiki_list’s custom_wiki2html settings, e.g.:

'custom_wiki2html': '/path/to/vimwiki_markdown.py'

Thank you!

jreichert commented 1 year ago

I know your temp solution works because that's exactly what I'm doing :) But I actually did that because just putting 'vimwiki_markdown' in my .vimrc didn't work - it couldn't find the vimwiki script. Do I need to do something like update my $PYTHONPATH to make sure it is included? Something else?

I'll open a PR, and if the approach is accepted then it will be teed up and ready to go.

joaoandreporto commented 1 year ago

Well, maybe try seeing which python version and paths are set inside Vim:

:pyx print(sys.version)
:pyx print(sys.path)

The place I’d look for vimwiki_markdown.py, would usually be somewhere like the site-packages, e.g.:

$ ls /usr/local/lib/python3.11/site-packages/ | grep vimwiki*

site-packages should be one of the items in the sys.path list.

If it’s not showing up, you may need to export a different path for another python version, inside your .bashrc, which has vimwiki_markdow installed. You may have changed your .bashrc’s python path recently and didn’t update your pip packages afterwards, so maybe check that also.

Furthermore, you may be working with some python environment management system, where vimwiki_markdown is not installed.

Ultimately, I don’t know, search everywhere with find, fzf, what have you!

I hope this helps, if not, maybe @WnP can help you troubleshoot this more efficiently.

When you’re ready, please open a PR on my fork’s side, so that I may merge it with this one, I’ve set it like the following below, but you may prefer something else. It’s up to you to decide.

class LinkInlineProcessor(markdown.inlinepatterns.LinkInlineProcessor):
    """Fix wiki links"""

    # regex patterns for .md extensions and anchors in href
    md_ext_pattern = compile_(r'(.+)(\.md)($|#.+)')
    anchor_pattern = compile_(r'(.*)#(.+)')

    @staticmethod
    def remove_md_ext(href):
        # regex match for .md extension
        md_ext_match = LinkInlineProcessor.md_ext_pattern.search(href)
        if md_ext_match:
            # remove .md from href
            return LinkInlineProcessor.md_ext_pattern.sub(r"\1\3", href)
        else:
            return href

    def getLink(self, *args, **kwargs):
        href, title, index, handled = super().getLink(*args, **kwargs)

        # check for and remove .md extension
        href = LinkInlineProcessor.remove_md_ext(href)

        # regex match for anchor hrefs
        anchor_match = LinkInlineProcessor.anchor_pattern.search(href)

        if not href.startswith("http") and not href.endswith(".html"):
            # index md to html link
            if auto_index and href.endswith("/"):
                href += "index.html"
            # anchor md to html link (on both current and different pages)
            elif anchor_match:
                base_page = ""
                hlnk, anchor_wiki_text = anchor_match.group(1, 2)
                if hlnk:
                    base_page = hlnk + ".html"
                anchor = markdown.extensions.toc.slugify(anchor_wiki_text, "-")
                href = base_page + "#" + anchor
            # no anchor md to html link
            elif not href.endswith("/"):
                href += ".html"

        return href, title, index, handled

WnP commented 1 year ago

This MR has been closed in favor of #27

Feel free to comment there if it doesn't fit your needs.

WnP / vimwiki_markdown

improve md link conversion for a better html navigation experience #25