elixir-lang / ex_doc

ExDoc produces HTML and EPUB documentation for Elixir projects
http://elixir-lang.org
Other
1.44k stars 325 forks source link

Don't allow selecting iex> in doctests #835

Closed michalmuskala closed 6 years ago

michalmuskala commented 6 years ago

Today copying from doctests is a bit annoying, especially if there are multiple lines involved, because you'll also select the iex> or ...> parts.

It would be great, if we could make it so that those parts are skipped over when you try to select and copy the code from examples. I did a little research and it seems it's possible to do this in css with user-select: none, but I'm not really sure how to integrate it into the ex_doc pipeline.

josevalim commented 6 years ago

Oh, I like this. I think a separate pass that finds code tags and fixes the iex> inside them would work?

eksperimental commented 6 years ago

I wanted to do this for the elixir website but i never found a way around. If you find it please lets port it there as well

nscyclone commented 6 years ago

I've proposed a PR to highlight.js that might help with this. But unfortunately the project is not very active lately (last change to master was made in Jun 2017 and there are a lot of PRs requiring attention). Maybe it makes sense to temporarily fork it into elixir-lang, then patch it on our own and update dependencies?

josevalim commented 6 years ago

@nscyclone yes, let's do that. I have forked it here and given you commit access: https://github.com/elixir-lang/highlight.js

Btw, if highlight.js is not being maintained, should we look for alternatives?

OvermindDL1 commented 6 years ago

Btw, if highlight.js is not being maintained, should we look for alternatives?

Elixir Library MarkUp? It's fantastic for Elixir and easy to add more languages.

josevalim commented 6 years ago

Oh right! @tmbb, how are things going on the markup front? We have talked about it in the past but I don't remember where we left things off. I believe we have added bindings to ExDoc to support it fully, right?

Also, should we add this feature to markup too?

tmbb commented 6 years ago

This is a long response, and its divided into 4 parts

Part 1: Answering @OvermindDL1 :

Elixir Library MarkUp?

Again OvermindDL1's misspellings attack again. There is no such thing as MarkUp, It's called Makeup for the obvious reasons (it makes your source code all pretty and colorful)

It's fantastic for Elixir

That remains true, it's still the best Elixir highlighter I've seen in the wild.

and easy to add more languages.

Easy if you're willing to write custom lexers for such languages. Which is actually pretty easy usually, but it's a labor-intensive process. Things like string interpolations are not trivial to get right, even though Makeup's parser combinators help a lot. Still beats writing Pygments lexers, though.

BTW, contributors are welcome... I still have to write some docs on how to contribute (and an ExSpirit tutorial, becaus efiguring it out through trial and error like I did is suboptimal xD)

Part 2: Answering @josevalim:

I believe we have added bindings to ExDoc to support it fully, right?

Yes indeed. I've written a "markdown implementation" (unimaginatively named ExDocMakeup) which is just earmark customized to use Makeup for syntax highlighting of the supported languages.

Here is a link to the elixirforum topic. There is a link to a much more detailed description of the package here. I have used ExDocMakeup to prepare the documentation for some larger Elixir packages, such as phoenix and ecto. In fact, I've submitted a PR for phoenix so that it uses ExDocMakeup, but it hasn't been accepted yet due to lack of test. Which brings us to the next point:

Oh right! @tmbb, how are things going on the markup front? We have talked about it in the past but I don't remember where we left things off

Makeup is very functional. It highlights Elixir better than anything else I've seen around, including VSCode, for example (and much better than highlight.js, of course). AFAIK, it's the only highlighter that correctly highlights:

def x + y, do: x - y

(note that the above highlights x as if it were a function name, which is incorrect)

Currently makeup only supports two languages: Elixir and HTML. Because Makeup uses a real parser and not weird regex stuff, lexers are trivial to compose and I can produce a decent .html.eex lexer quickly if I want. Making it perfect might take some work, and that's the only reason why I haven't published such lexer yet (It's kinda hard to explain what makes it so hard).

The big red flag here is that neither Makeup currently has almost no tests. This is mainly because writing unit tests for a syntax highlight is very, very boring, but I'm hoping the new assert_value package might help with that. The lack of tests is what's preventing it from being used to document Phoenix.

I'm currently writing a Python lexer (just for fun, it's pretty easy except for some string interpolation weirdness), but I can drop it to write some tests so that Makeup gets some decent coverage.

If Makeup is meant to be included as part of ExDoc, I propose the following plan:

  1. write tests for all makeup common combinators (those that are included in Makeup itself and are used by multiple lexers)

  2. write tests for all rules in both the Elixir lexer and the HTML lexer

  3. write an EEx lexer and include it my default with ExDoc. This makes sense because of Elixir's strong web presence. Documenting projects such as Phoenix or Drab benefits a lot from having a good EEx highlighter.

  4. define the scope of Makeup: should it include language detection (Pygments does this)? How should one pick a lexer from a string? Should this be handled by Makeup itself or should this be handled by the end-user? I'd love @josevalim's help on this one... I lean towards keeping a lean core that only lexes stuff and formats the results.

  5. Based on the above, stabilize the API and publish Makeup version 1.0.0 (which can be included in ExDoc version X.Y.Z)

  6. Currently compiling makeup produces some "scary" warnings. Those warnings come from compiling ExSpirit macros. Those macros do something that's tagged as unsafe by the compiler but are actually safe. The only way to avoid those warnings would be to dynamically generate atom names at compilation time, which @OvermindDL1 is reluctant to do. This should be discussed between @OvermindDL1 and @josevalim and addressed, as I don't think it's reasonable that ExDoc's compilation produces such weird warnings. In any case, ExSpirit is very small (although pretty gnarly for performance reasons) and maintaining a separate fork is not unreasonable.

  7. Based on the point above, stabilize ExSpirit's API (at least the ExSpirit.Parser module) and release version 1.0.0.

Part 3: handling IEx prompts

MakeupElixir already handles IEx prompts (iex>, iex(123)> and ...>). They are recognized by the normal Elixir lexer as a special token and highlighted in a specific way. With a trivial change in the lexer and a slightly less trivial change in the formatter it is possible to add the necessary CSS attributes.

A negative side effect of having the default Elixir lexer highlight these prompts is that they might cause us to wrongly highlight the string iex> ... as a pompt instead of as iex is greater than .... This case is very rare, and simplifies the use of the library with ExDoc. I can elaborate on this topic a little further if you want.

Part 4: Changes on ExDoc's side

There is no need to change anything on ExDoc's side except depending on ExDocMakeup (which will recursively depend on Makeup, MakeupElixir and MakeupHTML, and, of course, ExSpirit). This will increase ExDoc's compilation time, as ExSpirit parsers take a while to compile (it seems that Elixir is not very quick at expanding macros and compiling them into BEAM files). For the 3 parsers I want to include, this is doable, as the compilation time is a one-time cost.

ExDocMakeup has sensible defaults and requires no user configuration for default results, so it's possible that most users wouldn't even notice we were using it.

josevalim commented 6 years ago

The big red flag here is that neither Makeup currently has almost no tests.

To be honest, I don't think the markups themselves need to have tests. It is very likely that none of the options we would consider from JS would have them. However, it is important that Makeup itself has tests and I would start by focusing there.

For the each language implementation, I would recommend having a couple of tests as examples, so you can at least require future changes to the grammar to contain tests.

Those macros do something that's tagged as unsafe by the compiler but are actually safe

Those warnings should be fixed because they will emit compilation errors on v1.7. See explanation here.

Regarding the compilation speed, I can take a look at them and see if we could make anything faster.

define the scope of Makeup: should it include language detection (Pygments does this)?

IMO it is fine not supporting language detection. However, we need to make sure Earmark+Makeup supports fenced code blocks like GitHub (i.e. triple backticks followed by the language).

MakeupElixir already handles IEx prompts (iex>, iex(123)> and ...>).

Beautiful. I agree with your decision. The chance of false positives with iex> is not worth worrying about. And we have the same issue with highlight.js anyway. However, it would be great if we get the user-select: none clauses in those prompts. Could you please investigate it?

I think something else we need to do is to remove highlight.js from the default build and include it only only when using the earmark markdown engine. @nscyclone, could we use your help on this one? For now, compiling highlight.js to a separate javascript file and including it in the layouts shall suffice. We can move it to the markdown engines later.

If everybody is on board, I will open up a new issue with those TODOs so we can tackle them.

tmbb commented 6 years ago

However, it would be great if we get the user-select: none clauses in those prompts. Could you please investigate it?

This is easy. I can tag these tokens with a {:selectable, false} attribute and change the HTML formatter to emit the appropriate CSS. My attribute system is highly extensible to anecipate these needs.

tmbb commented 6 years ago

However, we need to make sure Earmark+Makeup supports fenced code blocks like GitHub (i.e. triple backticks followed by the language).

It does. IIRC there was a slight hack that I needed to make this work, I'll check when I get home.

tmbb commented 6 years ago

To be honest, I don't think the markups themselves need to have tests.

The most important test for a highlighter is to make sure that highlight(text) never chokes on any input (even if the highlighted code is illegal). This is easy to guarantee, but it's the obly general property I can inagine. The rest must be example-based. As I said above, the assert_value package might help a lot.

tmbb commented 6 years ago

One final thing: My lexers highlight matching parentheses, brackets, angle brackets, etc. on mouseover. The implementation details don't matter, but this feature required making the output nondeterministic...

I need html attribute vamues that are globally unique for the HTML document. The easiest way is to use random numbers, which I am doing now.

There are ways of making this deterministic by using the process dictionary If I assume that the entire HTML document will be enerated by the same process. Needless to say, this is not a safe assumption in general, so I'm stuck with using globally unique random values. I'm ok with this approach, but I felt he need to disclose this source of non-determinism.

tmbb commented 6 years ago

I think something else we need to do is to remove highlight.js from the default build and include it only only when using the earmark markdown engine.

I don't think this is a good idea. As ugly as highlight.js is, it does highlight pretty much everything under the sun and is a useful fallback for languages not supported by Makeup.

josevalim commented 6 years ago

I'm ok with this approach, but I felt he need to disclose this source pf non-determinism.

I don't see any problem with it so far.

I don't think this is a good idea. As ugly as highlight.js is, it does highlight pretty much everything under the sun and is a useful fallback for languages not supported by Makeup.

We already use a limited version of highlight.js. Furthermore, users who really need highlight.js can opt out of Makeup OR include it manually.

tmbb commented 6 years ago

Regarding the compilation speed, I can take a look at them and see if we could make anything faster.

It's probably very easy to make it compile faster at the cost of parsing speed. ExSpirit has been built for speed, which apparently requires minimizing function calls and dumping literal AST into function bodies. This supposedly makes it run faster (@OvermindDL1 says he has benchmarked it extensively) and allows some optimizations. If you replace everything by functions it starts compiling much faster. I think I prefer lower compilation speeds in exchange for faster runtime performance, especially if Makeup is ever used in production and not only in dev (suppose you're highlighting code on a server or something like that).

But in any case, I'm sure @OvermindDL1 will be very happy to discuss ExSpirit's performance issues.

OvermindDL1 commented 6 years ago

Again OvermindDL1's misspellings attack again. There is no such thing as MarkUp, It's called Makeup for the obvious reasons (it makes your source code all pretty and colorful)

Yeah I am horrible with names... ^.^;

Things like string interpolations are not trivial to get right, even though Makeup's parser combinators help a lot. Still beats writing Pygments lexers, though.

Really? It should be fairly easily parseable. What corner conditions is it failing at and where's the relevant rules?

AFAIK, it's the only highlighter that correctly highlights:

Github screwed up on it too pretty well... ^.^;

tests

At least it's easier to test based on each rule individually without worrying too much about how they interact, one of the benefits of this style of parsing unlike regex variants.

write an EEx lexer and include it my default with ExDoc. This makes sense because of Elixir's strong web presence. Documenting projects such as Phoenix or Drab benefits a lot from having a good EEx highlighter.

Especially if it can highlight both html.eex is a source type and ~e/~E string blocks as that is the common use of the e/E sigil? :-)

On an aside, this is a big reason why the ```\n...\n``` form of code fences are better in Elixir doc's unlike indentation, because you can set the syntax color. Having iex, elixir, html.eex, basic fallback eex that highlights eex but nothing inside, html, erlang, etc, would all be good defaults to support (maybe erlang later).

define the scope of Makeup: should it include language detection (Pygments does this)? How should one pick a lexer from a string? Should this be handled by Makeup itself or should this be handled by the end-user? I'd love @josevalim's help on this one... I lean towards keeping a lean core that only lexes stuff and formats the results.

I'd vote for code fences with explicit source names, falling back to just elixir as the source name.

The only way to avoid those warnings would be to dynamically generate atom names at compilation time, which @OvermindDL1 is reluctant to do.

Well I 'do' generate names but only per function type. Right now Elixir complains about variables leaking out of their scope (which I really really wish they would not do and I wish that I could disable that functionality). I probably should just always generate names. Right now I'm generating them in a few key recursive areas, but I should do more likely... It's easy to add them at least, just need to find the few right magical areas where they are actually useful to use. :-)

Based on the point above, stabilize ExSpirit's API (at least the ExSpirit.Parser module) and release version 1.0.0.

I shall focus on this soon, I want to finish my updating, compiling speed enhancements, etc...

Those warnings should be fixed because they will emit compilation errors on v1.7.

Whooooo! Oh thank @josevalim on high! ^.^

Regarding the compilation speed, I can take a look at them and see if we could make anything faster.

Eh, I know exactly what I need to do, and it's a bit of moving around of a lot of stuff, I just need to get it done... >.>

Even then it won't get purely 'fast' unless I make a full-on DSEL in elixir AST or something just because there are macro's upon macro's piled upon macro's... But it can still be helped quite a bit more than a bit from where it is now.

It's probably very easy to make it compile faster at the cost of parsing speed. ExSpirit has been built for speed, which apparently requires minimizing function calls and dumping literal AST into function bodies. This supposedly makes it run faster (@OvermindDL1 says he has benchmarked it extensively) and allows some optimizations.

Right now the macro's all exist within the module calling them. However I can lift them to another module and delegate to them instead. The cross-module call cost should be entirely saved by not having the interpreter run on the in-module macro's and I'm expecting a fairly decent speed boost from this refactoring. I could gain more speed if I were willing the break the API, but as it sits the API is quite generic-elixir-looking, which I like. Thus I'm planning for the API breaking and non-elixir looking parser version to be a separate module entirely so the current one will work as-is.

Even then, the 'runtime' speed I could make a little faster if I changed the map context to be a (multi-depth) tuple, but that makes the user expansion and use of the context significantly less pleasant for Elixir-users (though it would be traditionally more erlang'y).

There are lots of little micro-optimizations I could make too, but they tend to require making the code even more hairy, and I think it's fuzzy enough as it is.

If you replace everything by functions it starts compiling much faster. I think I prefer lower compilation speeds in exchange for faster runtime performance, especially if Makeup is ever used in production and not only in dev (suppose you're highlighting code on a server or something like that).

But in any case, I'm sure @OvermindDL1 will be very happy to discuss ExSpirit's performance issues.

But yep, let me do this refactor and see just how much it will help. :-)

OvermindDL1 commented 6 years ago

I pulled those out and yay the build time for makeup_elixir dropped from 28.7s to 25.5s, that was entirely reasonable... >.>

Oh if only there were a compile-time profiler like there is for C++ templates... ^.^;

(Does anyone know of a compile-time profiler for Elixir? >.>)

OvermindDL1 commented 6 years ago

/me is starting to lean toward breaking the API...

tmbb commented 6 years ago

Maybe quote the module and compile it at runtime.... Then you can profile it with Benchee or something.

OvermindDL1 commented 6 years ago

@tmbb Thought of that but I'd have to test each little thing individually, where a profiler would show me where the hotspots actually are like I can see of templates in C++.

michalmuskala commented 6 years ago

Yeah, 26s build time, I think, is pretty much a deal breaker. All of ecto (with all dependencies) compiles in 15s for me (using time mix compile). Recompiling libraries is more common then you might think in practice. I'd say I usually recompile everything in a project once or twice a day (and then there's CI).

OvermindDL1 commented 6 years ago

Breaking the parser rules up in to more modules would speed it up as well, possible quite dramatically. And putting the parse call in a different module from the rules would help a lot there as well. @tmbb any chance to test this?

OvermindDL1 commented 6 years ago

@tmbb I updated ex_spirit to 0.4.0 with those changes. I bumped the 'major' version to 4 because it's technically API changing (though not with common usage). The calls are no longer exported from the module, but that is only a single-line add to an existing project if that functionality is used (which should not be in almost all cases). :-)

This one should see better speed with breaking up the rules than the prior, potentially by quite a bit.

Though as always, breaking up the different rules (and the main parse call) into different modules will technically slow down runtime slightly, still not enough to worry about I think (should be fairly imperceptible in 99% of cases).

OvermindDL1 commented 6 years ago

@michalmuskala I time'd mix compile of a fresh ecto clone and dep grab and I got 19.271s, so my desktop here is definitely a bit slower than yours (I'm at work). ^.^;

And the timing of the makeup compile in my prior test is all files with no _build for note*

josevalim commented 6 years ago

@OvermindDL1 you should also consider compiling the code to different functions and use @compile {:inline, fun: arity} if you need the perf. It should be faster because the inline happens later on, which means less work for the compiler.

OvermindDL1 commented 6 years ago

I do use functions where possible (though not specifying them as inline), the macro's are used in places where code generation is expected (generally the main parsing functions, and things like the text module are entirely non-macro functions).

tmbb commented 6 years ago

Maybe the slow compile time are partly my fault :p I didn't make that much of an effort to make the lexers compile quickly. But I think they are as simple as possible. Syntax highlighting requires a lot of rules.

In the case of Elixir you probably need more rules than a proper parser, as much of what we highlight as "syntax" is actually "convention", e.g. def/2 is just macro, and for a real elixir parser it's indistinguishable from any other fun/2 (I've had this discussion jn the forum, and people like Robert Virding disagree with my definition of syntax, but hopefully I've gotten the point across)

tmbb commented 6 years ago

maybe erlang later

I can't read or write Erlang very well (I mean, Incan the simple stiff, and from the looks of it the whole language is pretty simple lol), but from looking at the corresponding Pygments lexer it looks veeeery easy to highlight.

OvermindDL1 commented 6 years ago

Yeah Erlang is a very simple language, dead-simple to parse. :-)

josevalim commented 6 years ago

I took a look at ex_spirit and combine and I have noticed they follow non-ideal approaches. ex_spirit does AST traversing, which makes it a bit harder to compose and document. combine is runtime based, which is slower. But more importantly, none of them compile to binary matches nor rely on the VM binary matching optimizations.

So over the last 48h I built a yet another parsec combinator library for Elixir called NimbleParsec. The combinator composition happens fully at runtime which is then compiled down to binary matching. It works similar to quoted expressions in Elixir. The combinators build an AST which is then compiled down to function clauses. See the README above for more information.

I have ran @OvermindDL1 benchmark scripts and I got this result:

$ mix bench
Erlang/OTP 20 [erts-9.0] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false]
Elixir 1.7.0-dev
Benchmark suite executing with the following configuration:
warmup: 2.0s
time: 3.0s
parallel: 1
inputs: parse_datetime, parse_int_10
Estimated total run time: 30.0s

Benchmarking with input parse_datetime:
Benchmarking combine...
Benchmarking ex_spirit...
Benchmarking nimble...
Warning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning

Benchmarking with input parse_int_10:
Benchmarking combine...
Benchmarking ex_spirit...
Benchmarking nimble...
Warning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning

##### With input parse_datetime #####
Name                ips        average  deviation         median
nimble        1425.75 K        0.70 μs   ±437.03%        0.70 μs
ex_spirit      177.70 K        5.63 μs   ±115.86%        5.00 μs
combine         95.83 K       10.44 μs    ±93.60%        9.00 μs

Comparison:
nimble        1425.75 K
ex_spirit      177.70 K - 8.02x slower
combine         95.83 K - 14.88x slower

##### With input parse_int_10 #####
Name                ips        average  deviation         median
nimble         949.95 K        1.05 μs   ±216.77%        1.00 μs
ex_spirit      463.71 K        2.16 μs   ±950.25%        2.00 μs
combine        338.62 K        2.95 μs   ±760.53%        2.00 μs

Comparison:
nimble         949.95 K
ex_spirit      463.71 K - 2.05x slower
combine        338.62 K - 2.81x slower

The above shows that for parsing datetimes, nimble is 8x faster than ex_spirit and 14x faster than combine. For the integer case, nimble is only twice faster, but it is worth noting Nimble's integer parser is written on top of existing combinators while the integer parser for both ex_spirit and combine are written by hand. So nimble is beating hand-written code here.

I did not measure memory usage as well but that should also decrease wth nimble thanks to binary matching.

I have also benchmarked compilation times by compiling the same datetime parser 30 times. combine takes 1s, which makes sense as it is runtime based. nimble takes 2s and ex_spirit takes 6s. So hopefully this library alleviates compilation times too.

While nimble is extremely new, I think most of the primitives are there, so you should be able to build almost anything. Improvements, suggestions and feedback are very welcome, thanks!

OvermindDL1 commented 6 years ago

I took a look at ex_spirit and combine and I have noticed they follow non-ideal approaches. ex_spirit does AST traversing, which makes it a bit harder to compose and document. combine is runtime based, which is slower. But more importantly, none of them compile to binary matches nor rely on the VM binary matching optimizations.

Correct, I was building an expression parser that would do that instead but I've gotten too busy at work to finish it... ^.^;

So over the last 48h I built a yet another parsec combinator library for Elixir called NimbleParsec. The combinator composition happens fully at runtime which is then compiled down to binary matching. It works similar to quoted expressions in Elixir. The combinators build an AST which is then compiled down to function clauses. See the README above for more information.

All the wootness!!

Just needs features fleshed out and I can toss out ex_spirit, whoo!