Witiko / markdown

:notebook_with_decorative_cover: A package for converting and rendering markdown documents in TeX
http://ctan.org/pkg/markdown
LaTeX Project Public License v1.3c
331 stars 31 forks source link

URL Parsing #475

Closed l0th3r closed 3 months ago

l0th3r commented 3 months ago

Hi,

Thank you for your work on this package. And for your care when it comes to issues.

I encountered an issue while rendering a url with special characters. In french web some articles/page names have some é è à characters for example. Those characters are encoded in UTF-8 such as é = %C3%A9

How can I sanitize the content of the markdown files for them not to be interpreted as Tex ?

Content of the markdown file:

## Organisation
L'organisation doit suivre une logique [récursive](https://fr.wikipedia.org/wiki/R%C3%A9cursivit%C3%A9).

I used \markdownInput{file.md}

Rendered link: image

I already tried this before markdown input:

\catcode`\$=12
\catcode`\&=12
\catcode`\#=12
\catcode`\^=12
\catcode`\_=12
\catcode`\~=12
\catcode`\%=12

Thank you in advance for your answer !

Witiko commented 3 months ago

Hi, thanks for using the Markdown package!

How can I sanitize the content of the markdown files for them not to be interpreted as TeX?

They are not interpreted as TeX. Namely, the percent signs are deactivated when using the \markdownInput command similarly how you did it in your above example. The issue would be with the default renderers and how they process the URL.

Here is the TeX output of your above example document:

\markdownRendererDocumentBegin
\markdownRendererSectionBegin
\markdownRendererSectionBegin
\markdownRendererHeadingTwo{Organisation}\markdownRendererInterblockSeparator
{}L'organisation doit suivre une logique \markdownRendererLink{récursive}{https://fr.wikipedia.org/wiki/R\markdownRendererPercentSign{}C3\markdownRendererPercentSign{}A9cursivit\markdownRendererPercentSign{}C3\markdownRendererPercentSign{}A9}{https://fr.wikipedia.org/wiki/R%C3%A9cursivit%C3%A9}{}.
\markdownRendererSectionEnd 
\markdownRendererSectionEnd \markdownRendererDocumentEnd

You should try and redefine the command \markdownRendererLink to produce a functional hyperlink using the arguments #1 (the text récursive) and #3 (the URL https://fr.wikipedia.org/wiki/R%C3%A9cursivit%C3%A9). For example:

\documentclass{article}
\usepackage{markdown}
\markdownSetup {
  renderers = {
    link = {%
      \href{#3}{#1}%
    },
  }
}
\usepackage{hyperref}
\begin{document}
\begin{markdown}
## Organisation
L'organisation doit suivre une logique [récursive](https://fr.wikipedia.org/wiki/R%C3%A9cursivit%C3%A9).
\end{markdown}
\end{document}

image

However, we also seem to produce functional hyperlinks with the default renderers:

\documentclass{article}
\usepackage{markdown}
\usepackage{hyperref}
\begin{document}
\begin{markdown}
## Organisation
L'organisation doit suivre une logique [récursive](https://fr.wikipedia.org/wiki/R%C3%A9cursivit%C3%A9).
\end{markdown}
\end{document}

image

Therefore, it's unclear to me how you arrived at your result. Can you share an example TeX document that I can compile?

Furthermore, what version of the Markdown package do you use? You should see a line such as Package: markdown 2024-07-01 v3.6.1-17-gf57d31e7 markdown renderer in the .log file that is produced when you compile your example document.

l0th3r commented 3 months ago

Thank you for your reply.

So yes I was already overriding the render of the hyperlinks using this tho:

\global\def\markdownRendererLink#1#2#3#4{
  \href{#2}{#1}
}

Version log: Package: markdown 2024-07-14 v3.6.2-0-g6c30af7e markdown renderer

For you to compile locally:

There are a lot of other things because I isolated the problem from my working file for you to have the all work context without all the work content

test.tex:

\documentclass{article}

\usepackage{framed} % package to frame content in boxes
\usepackage{geometry} % package for custom page layout
\usepackage{setspace} % package for different spaces
\usepackage{fancyhdr} % package for custom header and footer
\usepackage{lastpage} % package for getting last page
\usepackage{graphicx} % package to import graphics
\usepackage{hyperref} % package to use hypertext references
\usepackage{fontspec} % package to use custom fonts
\usepackage{tabularx} % package to use custom tables
\usepackage{multirow} % package to combine rows in table
\usepackage{markdown} % package to parse markdown
\usepackage{verbatim} % package to sanatize text
\usepackage{titlesec} % package to change section titles
\usepackage[table]{xcolor} % package to use colors w/table

% uncomment to debug document geometry
%\geometry{showframe}

% define page size, orientation, include header and footer in computation
\geometry{a4paper, portrait, includehead, includefoot}
\geometry{headheight=97pt, headsep=10mm}
% define page margins
\geometry{tmargin=5mm, lmargin=20mm, rmargin=20mm, bmargin=10mm, nomarginpar}

% define graphics path
\graphicspath{{./}{content/}}

% set main font
\setmainfont{OpenSans}

% table settings
\newcolumntype{Y}{>{\hsize=0.16\textwidth}X}
\newcolumntype{Z}{>{\centering\arraybackslash}X}
\renewcommand{\arraystretch}{1.4}
\renewcommand\tabularxcolumn[1]{m{#1}}
\arrayrulecolor{red}

% style for hypertext
\hypersetup{
    colorlinks=true,
    linkcolor=blue,
    filecolor=cyan,
    urlcolor=cyan
}
\urlstyle{same}

% Override hyperlinks
\global\def\markdownRendererLink#1#2#3#4{
  \href{#2}{#1}
}

% override section title style
\titleformat{\section}
{\filcenter\bfseries\Large}
{\thesection.}{0.5em}{}

% markdown settings
\markdownSetup{
    blankBeforeCodeFence=true,  % add blank space before code blocks
    fencedCode=true,                % allow code blocks
    fancyLists=true,                % allow more ways to create lists
    texMathDollars=true,            % allow Tex dollar sign mathematical expression
    inlineNotes=true,           % allow inline notes
    pipeTables=true,                % allow markdown tables
    taskLists=true,             % allow task lists
    underscores=false,          % remove usage of underscores for emphasis
}

\begin{document}
\markdownInput{test.md}
\end{document}

test.md:

# Organisation
L'organisation doit être [récurive](https://fr.wikipedia.org/wiki/R%C3%A9cursivit%C3%A9)

Visual résultat: image image

Witiko commented 3 months ago

So yes I was already overriding the render of the hyperlinks using this tho:

\global\def\markdownRendererLink#1#2#3#4{
  \href{#2}{#1}
}

Use the argument #3 instead of #2. As you can see in https://github.com/Witiko/markdown/issues/475#issuecomment-2283914437, #2 contains the URL text for typesetting with special symbols replaced with renderer commands, whereas #3 contains the raw URL text, which is what you want here.

Furthermore, you should add % at the end of the first two lines to get rid of the extra spaces around the hyperlink. Like this:

\global\def\markdownRendererLink#1#2#3#4{%
  \href{#2}{#1}%
}
l0th3r commented 3 months ago

It works perfectly fine for me ! Thank you so much for your time and attention, have a good one !

Witiko commented 3 months ago

Happy to help!