apluslms / a-plus

A+ frontend portal - A+ LMS documentation:
https://apluslms.github.io/
Other
68 stars 73 forks source link

Set the programming language manually to code snippets to avoid bad automatic language detection in highlight.js #1221

Closed markkuriekkinen closed 1 year ago

markkuriekkinen commented 1 year ago

The library highlight.js is used to highlight the syntax of programming source code in submissions, model solutions and assignment template/skeleton code. Highlight.js was updated in #1108.

Highlight.js is particularly used in our custom jQuery plugin hightlightCode: https://github.com/apluslms/a-plus/blob/28f2735979eda3229d10ab1ddc034d775ac84d25/assets/js/aplus.js#L312

Highlight.js may automatically detect the programming language whose syntax is highlighted, but the latest version of highlight.js fails to detect the language correctly more often than expected. For example, some Python files I tried were flagged as Kotlin code. The highlighting is incorrect when it uses the wrong language.

This can be circumvented by setting the programming language manually to the code element.

https://highlightjs.readthedocs.io/en/latest/readme.html#in-the-browser

If automatic detection doesn’t work for you, or you simply prefer to be explicit, you can specify the language manually in the using the class attribute:

<pre><code class="language-html">...</code></pre>

(We don't use the recommended tags <pre><code>.)

https://highlightjs.readthedocs.io/en/latest/api.html#highlightelement

The function uses language detection by default but you can specify the language in the class attribute of the DOM node. See the scopes reference for all available language names and scopes.

https://github.com/highlightjs/highlight.js/blob/main/SUPPORTED_LANGUAGES.md

So the class language attribute can be set to codeBlock before highlightElement is called: https://github.com/apluslms/a-plus/blob/28f2735979eda3229d10ab1ddc034d775ac84d25/assets/js/aplus.js#L312

Usually, highlightCode is called on an element that already has the data-url attribute. The source code is loaded via AJAX from the URL. The URL ends in the filename, thus we can use the filename extension (.py, .scala, .c etc.) for manually detecting the programming language.

If the data-url attribute is missing, we let highlight.js detect the programming language automatically.

If the filename extension is also recognized by highlight.js as an alias, then we could set the filename extension directly to the html class language attribute instead of making some mapping from filename extensions to the full programming language names. For example, highlight.js should recognize both language-python and language-py values in the class attribute.

https://github.com/highlightjs/highlight.js/blob/main/SUPPORTED_LANGUAGES.md

markkuriekkinen commented 1 year ago

After looking at https://github.com/highlightjs/highlight.js/blob/main/SUPPORTED_LANGUAGES.md and comparing it to the programming languages that our courses typically could use, only Matlab seems to missing the alias for the common filename extension .m. Highlight.js only recognizes the name matlab in the html class attribute.

Languages that I think we could see in practice include at least: Python, Scala, JavaScript, C, C++, Matlab, SQL, Java, PHP, HTML, CSS, Sass, TypeScript, JSON, YAML, plain text, Dockerfile

markkuriekkinen commented 1 year ago

About the data-url attribute I mentioned: in the A+ source code, the attribute is set before calling hightlightCode at least here: