devidw / obsidian-to-hugo

Process Obsidian notes to publish them with Hugo. Supports transformation of Obsidian wiki links into Hugo shortcodes for internal linking.
https://obsidian-to-hugo.wolf.gdn
MIT License
330 stars 23 forks source link

Codeblocks should not be parsed #12

Open anakojm opened 1 year ago

anakojm commented 1 year ago

Obsidian-to-hugo wrongly convert the following:

```python
if foo==bar and foo==baz:
    L = [[12,42],[13,90]]
```

to

```python
if foo<mark>bar and foo</mark>baz:
    L = [12,42],[13,90]({{< ref "12,42],[13,90" >}})
```

Codeblocks should instead be skipped to prevent such false positives (I have no idea how to implement this).

devidw commented 1 year ago

Hey @anakojm

I guess regex lookarounds should work for this use case, this would ideally nail the regex down to only those matches that are not written in between triple quotes

If you would like to give it a shot, feel free to add a test case for this in the md marks suit

anakojm commented 1 year ago

I am willing to try but one problem I am facing is that I can't do something like that r"(?<!^```.*?$).*?==([^=\n]+)==.*?(?!^```$)"gsm because it is not supported: re.error: look-behind requires fixed-width pattern.

I think you would be better off dealing with this issue as I lack experience in the matter.

Also why did you restrict the issue to the marks processor? The issue affect the wikilinks parser too, as shown by my example

In the meantime, I have written test cases, should I PR them? Maybe in another branch?

devidw commented 1 year ago

Alright I see

Also why did you restrict the issue to the marks processor? The issue affect the wikilinks parser too, as shown by my example

Good point, have overseen the change in the second line of the example ๐Ÿ™ˆ

If we want to point out the issue clearly and avoid misunderstandings, we can use the diff block on GH ๐Ÿ˜‰

```python
- if foo==bar and foo==baz:
+ if foo<mark>bar and foo</mark>baz:
-    L = [[12,42],[13,90]]
+    L = [12,42],[13,90]({{< ref "12,42],[13,90" >}})


> In the meantime, I have written test cases, should I PR them? Maybe in another branch?

Cool, yes that would be awesome, maybe an extra branch like `bug-codeblocks`
vonloxley commented 9 months ago

This might do the trick since Python 3.6:

    wiki_link_regex = r"(?ms:```.*?```)|\[\[(.*?)\]\]"
    for match in re.finditer(wiki_link_regex, text):
        if not match.group(1):
            continue
anakojm commented 9 months ago

it might work but i believe the problem is more fundamental. we canโ€™t parse markdown with regex properly since markdown is not a regular language.