mmm-noweb: Quotes spoiling font-locking

AndreasMatthias commented 9 years ago

Please save the code below as test.el and run:

emacs -Q -l test.el

The quotes in the code chunk ruin the font-locking of the second documentation chunk, which is spuriously interpreted as font-latex-verbatim-face. If I delete one quote, font-locking in the documentation chunk will be correct again.

Andreas

(require 'package)
(package-initialize)

(require 'mmm-noweb)
(setq mmm-global-mode 'maybe)
(setq mmm-mode-ext-classes-alist '((nil "\\.nw$" noweb)))
(setq mmm-noweb-code-mode 'lua-mode)
(add-to-list 'auto-mode-alist '("\\.nw$" . latex-mode))

(find-file "test.nw")
(insert "
test
<<*>>=
a = 'a'
@
test
")

dgutov commented 9 years ago

You forgot to mention that one needs to install lua-mode.

What's "second documentation chunk"? Spuriously when?

Please post a screenshot. A screencast would be even better.

AndreasMatthias commented 9 years ago

This issue is not restricted to lua. It happens as well with: c, c++, fortran, perl. But not with all modes, e.g. python is not affected by this issue.

The following screenshot was made after running the code below. This time with c-mode. As you see the face of the "second documentation chunk" is wrong. And it seems to be the quote in the code chunk that causes this.

mmm-01

(require 'package)
(package-initialize)

(require 'mmm-noweb)
(setq mmm-global-mode 'maybe)
(setq mmm-mode-ext-classes-alist '((nil "\\.nw$" noweb)))
(setq mmm-noweb-code-mode 'c-mode)
(add-to-list 'auto-mode-alist '("\\.nw$" . latex-mode))

(find-file "test.nw")
(insert "
first documentation chunk
<<*>>=
a = 'a'
@
second documentation chunk
")

dgutov commented 9 years ago

Which Emacs version is this? Are you starting with emacs -Q?

I'm not seeing this. The text, staring with @, is rendered using the default face.

dgutov commented 9 years ago

Are you trying mmm-mode from master?

AndreasMatthias commented 9 years ago

emacs-24.5.1
mmm-mode-20150810.519
yes, I'm running emacs -Q -l test.el

dgutov commented 9 years ago

Sorry, I still can't reproduce it, even with 24.5.1.

Maybe try the scenario without this (because this part doesn't work for me, for technical reasons):

(require 'package)
(package-initialize)

Instead, pass the path to the mmm-mode directory after -L: emacs -Q -L ... -l test.el.

AndreasMatthias commented 9 years ago

Ok, this really fixes the issue. But why? Unfortunately, I do not understand what's going on. I tried to debug this further, but... well my emacs skills aren't great. But here is one interesting thing I discovered:

If I add (debug-on-entry 'find-file) to the original test.el (i.e. without removing (require 'package) (package-initialize)) and start emacs with emacs -Q -l test.el (i.e. without -L option) and hit c twice to continue inside the debugger, I end up with a correct font-locking. Does this make any sense?

dgutov commented 9 years ago

But why? Unfortunately, I do not understand what's going on.

I guess some other package is interfering somehow. To find out which one, try removing them one-by-one (simply deleting the directories will suffice) and repeat the experiment. Probably back up .emacs.d/elpa first.

Or, as the first step, remove them all except for mmm-mode, see if that helps, restore the backup, and then search for the offending package. If you tell me its name, that would help pinpoint the problem.

hit c twice to continue inside the debugger, I end up with a correct font-locking

This doesn't help much, unfortunately.

AndreasMatthias commented 9 years ago

It's auctex-11.88.7.

dgutov commented 9 years ago

Thank you. Installing auctex does it.

Unfortunately, I don't have anything better to suggest than adding this after package-initialize:

(require 'font-latex)
(defalias 'font-latex-setup #'ignore)

It will disable all extra highlighting added by auctex.

tsdh commented 9 years ago

@dgutov Better use

(setq TeX-install-font-lock #'tex-font-setup)

so that AUCTeX uses just the font-lock rules of the built-in tex-mode instead of font-latex.

dgutov commented 9 years ago

@tsdh Have you tried that solution with mmm-mode?

Auctex adds font-latex-setup unconditionally to latex-mode-hook, and even calls it during loading if the file's autoloaded from a buffer in latex-mode.

dgutov commented 9 years ago

And if that's not enough, it's also called from font-latex-add-to-syntax-alist, which has several callers.

tsdh commented 9 years ago

@tsdh Have you tried that solution with mmm-mode?

Nope, only with auctex-11.88.7.

Auctex adds font-latex-setup unconditionally to latex-mode-hook, and even calls it during loading if the file's autoloaded from a buffer in latex-mode.

Yes, but if you set TeX-install-font-lock to something different than font-latex-setup, font-latex.el won't even be loaded. I just validated that with a clean ~/.emacs which just contained that single line. Of course, if you require font-lock yourself somewhere or use its autoloaded functions, then you are out of luck.

And if that's not enough, it's also called from font-latex-add-to-syntax-alist, which has several callers.

All calls to font-latex-add-* functions in all files in auctex/style/ are properly guarded using

(when (and (eq TeX-install-font-lock 'font-latex-setup)
           (featurep 'font-latex))
  ...)

Bye, Tassilo

dgutov commented 9 years ago

I see, thanks. Sounds like it should work, and it's a better solution.

tsdh commented 9 years ago

@dgutov But I think that you are still on the right track. It has probably something to do with syntactic font-lock and some user configuration by @AndreasMatthias. AUCTeX's font-latex.el uses syntactic font lock for the math construct $...$ and verbatim constructs \verb|foo bar| or |foo bar| in case the shortverb package is used and | has been defined as shortverb char. Looking at the screenshot, it looks like the @ is defined as a shortverb char.

Well, that actually should only trigger in a document that also contains a corresponding \usepackage{shortverb} and \MakeShortVerb{@} but of course users are free to configure defaults as they like, e.g., using hooks and styles in TeX-style-global, TeX-site-private, and TeX-site-local. Given that the problem is also there with emacs -Q that basically removes the possibility of a broken customization of hooks.

So @AndreasMatthias, you could check the value of TeX-active-styles in your test.nw and then see if there're style files with any of these names in TeX-style-global, TeX-site-private, and TeX-site-local.

AndreasMatthias commented 9 years ago

(setq TeX-install-font-lock #'tex-font-setup)

This seems to be working. I just skimmed through some of my files and it looks good.

(setq TeX-install-font-lock #'font-latex-setup)

I'm quite sure that I'm testing with a vanilla auctex and mmm-mode. I removed ~/.emacs.d/elpa and then installed auctex and mmm-mode (from melpa.org) again. Tests are run with emacs -Q:

TeX-active-styles is a variable defined in `tex.el'.
Its value is ("test.nw" "LATEX")
Local in buffer test.nw; global value is nil

TeX-style-global is a variable defined in `tex.el'.
Its value is "/home/andreas/.emacs.d/elpa/auctex-11.88.7/style"

TeX-style-local is a variable defined in `tex.el'.
Its value is "style"

TeX-style-private is a variable defined in `tex.el'.
Its value is ("/home/andreas/.emacs.d/auctex/style")

I removed ~/.emacs.d/auctex/style for testing. All in all I think that I'm testing without any user configurations. But everythink in the second doc chunk (starting with @) is fontified with `font-latex-verbatim-face'.

tsdh commented 9 years ago

What's the value of font-latex-syntax-alist in that buffer?

AndreasMatthias commented 9 years ago

font-latex-syntax-alist is a variable defined in `font-latex.el'.
Its value is ((40 . ".") (41 . ".") (36 . "\"") (64 . "w"))

tsdh commented 9 years ago

Is 36 the char code of the @ letter? I don't have a computer with emacs handy. Can you check what ?@ evaluates to?

tsdh commented 9 years ago

Ah, no. 36 is $, and that's a default entry. So I run out of ideas what causes @ and the remaining text to be fontified as verbatim.

There must be something else which gives @ string delimiter syntax during font-lock which font-latex then recognized as math or verbatim construct.

AndreasMatthias commented 9 years ago

Somehow it must be related to the quote character:

doc
<<a>>=
a = 3
@
doc
<<a>>=
a = 5
@
doc
<<a>>=
a = 's'
a = 1
@
doc

tsdh commented 9 years ago

Yes, probably. What's the face of the closing quote which is displayed in red?

AndreasMatthias commented 9 years ago

The face is error. But I could not get the name of the face with M-x customize-face which yielded mmm-default-submode-face. So I used M-x list-face-display and modified all red faces until I got the right one. Is there an easier way to retrieve the face?

AndreasMatthias commented 9 years ago

Sorry. The face in the code chunk depends on the mode for these code chunks.

(setq mmm-noweb-code-mode 'c-mode)

With c-mode the closing quote is in face error whereas with lua-mode it is not. But the fontification of the following doc chunk is wrong with lua-mode as well. So, maybe this issue is not related to the quote...?

tsdh commented 9 years ago

You can always just move point on the character and do M-x describe-char which also shows the face. Anyway, it seems not to have anything to do with the quote.

How does mmm-mode know where the c code is and where the latex text is? Are the <<a>>= and the @ some kind of delimiter?

AndreasMatthias commented 9 years ago

Yes, this are the delimiters of noweb. Code chunks start with <<whatever>>= and documentation chunks with @.

tsdh commented 9 years ago

Ok, thanks. I'll try to debug the issue myself when I find some spare time. Don't hold your breath, though.

AndreasMatthias commented 9 years ago

Great! Thanks for your help!

tsdh commented 9 years ago

Ok, I did a bit debugging. What I know so far is that this is indeed caused by font-latex's syntactic fontification. When I change font-latex so that it sets font-lock-keywords-only, the problem goes away.

However, I could not find any syntactic font-lock rules in font-latex which would be triggered by a single quote character. When I open your example file just with latex-mode, I don't get these problems, i.e., the C code between <<a>>= and @ is not highlighted at all. This basically means that font-latex has no syntactic or search-based font-lock rules which would match anything in there. The documentation text is also not highlighted (which is correct), but when I put ,e.g., \textbf{foobar} in there, it is highlighted correctly.

So the problem seems to be caused by some wicked interplay between mmm-mode, c-mode, and font-latex. Since I have no clue how mmm-mode works, I can't debug any further. However, here are some observations which might give a clue to you mmm-mode guys. First a screenshot:

mmm-font-latex-issue

What you can see here is that the documentation latex chunks are correctly fontified except for the last one. And indeed, it seems to have something to do with the closing single quote character. We already guessed that.

Ok, some new observations:

The <<*>> are highlighted using font-latex-string-face. That's ok because those are french quotes in LaTeX, but are these separators really meant to be fontified by font-latex?
The red double and single quotes in the two C code chunks are fontified using font-lock-warning-face. font-latex has no single rule which would apply this face (it has its own font-latex-warning-face), so this fontification seems to be caused by C mode. But why? Those are completely valid C code snippets. c-mode's fontification rules are not too easy, but the only place where it would fontify a quote in font-lock-warning-face is in char a = 'a '; where it highlights the first quote (not the second) in font-lock-warning-face.
The parts "a and 'a of the string "a" and the char literal 'a' in the code chunks are fontified using font-latex-verbatim-face. Why are they fontified by font-latex? They should have been fontified by c-mode which would have given them font-lock-string-face. And why verbatim? "a" and 'a' are not strings in LaTeX, ``a'' is a valid string which would be fontified using font-latex-string-face (not verbatim).

So well, I actually don't know what's going on. But at least it seems wrong that font-latex also fontifies parts of the c-code chunks (in a way which it wouldn't do if the same contents appeared in a normal latex file without mmm-mode). Since I don't know how mmm-mode separates the fontification of the chunks of different languages, I have no idea where it goes wrong and why.

Both font-latex and c-mode have very complex font-lock rules, so things like font-lock-beginning-of-syntax-function, font-lock-extend-region-functions, and font-lock-extend-after-change-region-function all have to be considered...

dgutov commented 9 years ago

Since I have no clue how mmm-mode works, I can't debug any further.

You can get an overview of the fontification logic by reading mmm-fontify-region-list and mmm-syntax-propertize-function. The salient point is probably the fact that, when fontifying a primary mode regions, we don't apply narrowing (just pass the region bounds to one of the functions). IIRC, because doing it otherwise breaks some primary modes. So the primary mode must be able to not freak out over seeing the submode hunks.

Further, the case of font-lock-syntactic-keywords being set is not particularly well-tested, since it's been obsolete for years, and doesn't fit the whole fontification concept well.

That's ok because those are french quotes in LaTeX, but are these separators really meant to be fontified by font-latex?

We can make them into "delimiter regions", which makes them skipped for fontfication (but also triggers a weird font-lock bug here), but that doesn't help with the text after @: it's still fontified with font-latex-verbatim-face.

The red double and single quotes in the two C code chunks are fontified using font-lock-warning-face ... But why?

That's easy: because it's entirely impossible for me to constrain C mode to a region. It has caches, its own syntax parsing functions, etc, and it likes to break if we call its fontification function with narrowing applied. So we don't narrow, and C mode shows warnings because of the LaTeX code around.

With lua-mode, there's no warnings like that. So it's unrelated.

And why verbatim?

It's seemingly returned by font-latex-syntactic-face-function.

things like font-lock-beginning-of-syntax-function, font-lock-extend-region-functions, and font-lock-extend-after-change-region-function all have to be considered

The first seems to have the default value (which we bind to nil in opportune places). The rest seem to be nil already.

tsdh commented 9 years ago

Dmitry Gutov notifications@github.com writes:

Since I have no clue how mmm-mode works, I can't debug any further.

You can get an overview of the fontification logic by reading mmm-fontify-region-list and mmm-syntax-propertize-function. The salient point is probably the fact that, when fontifying a primary mode regions, we don't apply narrowing (just pass the region bounds to one of the functions).

Ok, I see.

IIRC, because doing it otherwise breaks some primary modes. So the primary mode must be able to not freak out over seeing the submode hunks.

font-latex doesn't freak out on the C snippets. It doesn't fontify or apply syntax changes to them at all when I open the noweb file in AUCTeX's latex-mode only. (I assume there could be issues if the code snippets contain $.)

Further, the case of font-lock-syntactic-keywords being set is not particularly well-tested, since it's been obsolete for years, and doesn't fit the whole fontification concept well.

Well, it's obsolete since 24.1. But I checked if font-latex could use a syntax-propertize-function instead and the answer seems to be "no." The font-lock-syntactic-keywords only apply during the time of fontification whereas the syntax-table properties applied by syntax-propertize-function persist and override the mode's normal syntax-table.

font-latex uses a trick for fontifying TeX inline math constructs (and verbatim macros). In the normal TeX syntax-table, ?$ has math syntax. But for the time of fontification, it gets string quote syntax so that each $ toggles between inside/outside string, and then font-latex-syntactic-face-function decides if that's really a string, inline math, or some verbatim thing.

I've tried using a s-p-f anyway just to see if that would make this issue go away, and it doesn't.

And why verbatim?

It's seemingly returned by font-latex-syntactic-face-function.

Yes, but it should not. It seems that always the primary mode's font-lock-syntactic-face-function is called, so basically with mmm-mode, the primary mode handles the syntactic faces for itself and also all "foreign language" chunks. That can't possibly work out very well with primary modes which have a non-standard f-l-s-f-f.

I still don't get the complete problem: ok, c-mode gives single and double quotes string syntax, and sadly font-latex-syntactic-face-function is called for these occurrences. Since the syntax is neither comment nor is the character a $, it decides to give it the verbatim face.

But I still cannot see why the following latex documentation chunks are fontified as verbatim. It has something to do with the quotes in the C chunks but I don't see the pattern.

--8<---------------cut here---------------start------------->8--- first documentation chunk \textbf{foo} <<_>>= char a = "foo bar" + ""; // no quote in f-l-warning-face @ This is fontified properly. <<>>= char a = 'a' + ''; // no quote in f-l-warning-face @ This is also fontified correctly. <<_>>= char* a = "foo bar"""; // trailing quote in f-l-warning-face @ This is also fontified correctly. <<_>>= char a = 'a'''; // trailing quote in f-l-warning-face @ This is also fontified correctly. <<_>>= char* a = "foo bar"; // trailing quote in f-l-warning-face @ This is fontified in verbatim. <<*>>= char a = 'a'; // trailing quote in f-l-warning-face @ This is fontified in verbatim. --8<---------------cut here---------------end--------------->8---

Bye, Tassilo

dgutov commented 9 years ago

The font-lock-syntactic-keywords only apply during the time of fontification whereas the syntax-table properties applied by syntax-propertize-function persist and override the mode's normal syntax-table.

If the syntax-table properties applied inside font-lock-syntactic-keywords do not persist, that seems like an unintended effect of its implementation.

But for the time of fontification, it gets string quote syntax so that each $ toggles between inside/outside string, and then font-latex-syntactic-face-function decides if that's really a string, inline math, or some verbatim thing.

Cute trick, but if it's really not implementable in any other way, you should file an Emacs bug. Otherwise, font-lock-syntactic-keywords will go away some day (any decade now), and there will be no adequate replacement. Or maybe someone will suggest how to implement the same simply using font-lock-keywords.

ok, c-mode gives single and double quotes string syntax, and sadly font-latex-syntactic-face-function is called for these occurrences.

Woo, that seems to have been the problem: we didn't take care of font-latex-syntactic-face-function variable. Now it depends on the current submode, and that seems to have fixed it.

Thanks for your help, and please report any further oddities.

AndreasMatthias commented 9 years ago

This is great news. Thank you very much for your help.

Andreas

dgutov / mmm-mode

mmm-noweb: Quotes spoiling font-locking #57