PrismJS / prism

Lightweight, robust, elegant syntax highlighting.
https://prismjs.com
MIT License
12.29k stars 1.3k forks source link

Improving the style of different nested languages #1218

Open wandras opened 6 years ago

wandras commented 6 years ago

The current release of Prism.js defines styles for all the possible tokens, regardless of the language being highlighted.

In my opinion this is a UX issue, as nested languages use token styles which are not always the best choice for an effective code readability.

Example

Think to a HTML document, with embedded JavaScript and CSS, together with some server-side language, such as ASP.net or PHP (you have it here: http://jsfiddle.net/landzup/vJbkM/).

In this case, the JavaScript "token number" has the same style of HTML "token tag". Or PHP "token attr-name" has the same style of HTML "token attr-name".

In some cases we could workaround it, by defining language-specific rules:

.language-javascript .token.number { color: #f04; }

but it is not always possible, as some languages are not wrapped into a language-specific class. In fact, in the multi-language example mentioned above we may notice that:

The first scenario is an incomplete solution because, even though highlighting will work, there is no "language-html" class Prism.js applies to the HTML code. It means that is impossible defining HTML specific classes.

Possible solutions

I suggest two possible solutions:

  1. upgrading the library to let it markup HTML code when it is nested into other languages, allowing developers to write language-specific styles;
  2. sharing the list of all the tokens of each language supported by Prism.js, opening the possibility to refine the CSS and style unmarked languages with some CSS workaround.

The first does not exclude the second, which would also help to optimize custom CSS by deleting classes for tokens which do not exist in a given language.

wandras commented 6 years ago

I tried to read all the tokens available to the library with the following snippet:

for (var lang in Prism.languages) {
    console.log('-----' + lang + '-----');

    for (var tokenName in Prism.languages[lang]) {
        console.log(tokenName);
    }
}

Does it represent all the tokens used in the highlighting by Prism.js? If yes, it would help in hacking the CSS as well.

Golmote commented 6 years ago

In some cases we could workaround it, by defining language-specific rules [...] but it is not always possible, as some languages are not wrapped into a language-specific class.

Is that true? Do we have cases where a language is not wrapped into a language-specific class?

all the languages get highlighted only if the outermost element has the "language-php" class; associating the "language-html" class to the outermost element, will cause the PHP code to be not highlighted.

Is that an issue? In Prism, we consider PHP as the "parent" language here. PHP code usually might contain HTML code. While the opposite is not true. HTML code can appear with many server-side languages and templating languages.

there is no "language-html" class Prism.js applies to the HTML code. It means that is impossible defining HTML specific classes.

The way the PHP component is defined, all classes used for HTML parts are unique to HTML, and all classes used for PHP parts are unique to PHP. Highlighting .language-php .token.tag will highlight only HTML tags in PHP.

I think I'm missing your point here...

Regarding your last comment, though, the answer is provided in the FAQ. This should give you everything you need to customize the CSS as much as you want.

wandras commented 6 years ago

Is that true? Do we have cases where a language is not wrapped into a language-specific class?

Yes, it is. HTML tokens inside PHP code are not marked with the language-html class. For instance, a HTML tag is marked as .language-php .token.tag instead of .language-html .token.tag or, to give more specificity to CSS selectors .language-php .language-html .token.tag

Is that an issue? In Prism, we consider PHP as the "parent" language here. PHP code usually might contain HTML code. While the opposite is not true. HTML code can appear with many server-side languages and templating languages.

Yes, it is an issue, because Prism.js is not associating a coherent style to the same token. I don't mind if a HTML tag is in a standalone HTML code or in a HTML code embedded into PHP. It has the same logic value. Instead you are subduing its style to the process it has to pass to become a HTML tag. Browsers do not care if it was generated from PHP, Ruby, Pascal or Fortran. They parse it always the same way, as a HTML tag. And users too would like to see tokens coherently styled with their final logic role apart of the pre-process, just like browsers.

In the end, considering PHP the parent language should not prevent the syntax highlighter from giving the HTML tokens the same style, apart of the preprocessor they are wrapped into.

benjaminBrownlee commented 6 years ago

I accidentally opened another issue on this very topic, but I wanted to say that I agree with @landzup. I find nested languages lacking proper syntax rather annoying. However, I am no authority to speak on how to improve this. My only guiding question could be is how does the way this library is structured different from in browser IDEs (such as my favorite cloud9) who have these code markup capabilities?