Enhance literate mode to recognise multi-modes.

jfdm commented 4 years ago

While Bird Style literate mode is useful, it does not lend itself well to more modern markdown-like notations such as Org-Mode and CommonMark.

This commit extends Idris2's existing literate mode to recognise both 'visible' and 'invisible' code blocks and lines in predefined markdown styles.

The styles and their elements are:

Bird Style :: > denotes a visible code line, < a hidden code line.
OrgMode :: Org Mode source blocks for idris are recognised as visible code blocks, and comment blocks are invisible code blocks. Invisible code lines are denoted with #+IDRIS:.
CommonMark :: Only code blocks denoted by standard code blocks labelled as idris are recognised.

For backwards compatibility, we recognise literate modes by file extension:

Bird Style :: .lidr
OrgMode :: .org
CommonMark :: .md

In future we should add support for literate LaTeX files, and more intelligent processing of literate documents using a pandoc like library in Idris such as: Edda.

jfdm commented 4 years ago

If this is an acceptable patch then I will add some documentation, and update the changelog, prior to its inclusion.

edwinb commented 4 years ago

Seems a good idea to me, especially for documentation purposes. Cheers! I'll leave it until you think it's ready to go though.

petithug commented 4 years ago

I have no opinion about OrgMode and CommonMark as I do not use these, but I am not 100% convinced about the '<' mark. I think that the choice of displaying or not the code should be left to the document processor, and not be a property of the Idris code. So I would just keep the '>' mark.

On the other hand I think that the constraint that a block of literate Idris starts with a blank line and ends with a blank line (except when at the top or bottom of the file) is too restrictive, this time because it prevents some document processors (in my case Asciidoctor) to decide what to display and what to ignore.

The original reason for that constraint is that it makes it easier to detect when code is accidentally not behind a bird mark. Maybe a less constraining rule could be that a code block cannot be immediately preceded and succeeded by text that the Idris parser would accept. That way we would keep the spirit of the constraint, without following it to the letter.

Now the question is how much that would slow down the processing of a large document.

petithug commented 4 years ago

One bug in the code I just found is that an empty literate line (i.e., just a '>' symbol) is rejected.

jfdm commented 4 years ago

Thanks for the comments.

I am not 100% convinced about the '<' mark.

TBH, when I wrote the original code (a long time ago) I thought that Bird notation has > for visible code, and < for invisible code. I was obviously mistaken there, looking at the Haskell98 Report it doesn't make mention of a invisible line.

I will remove that from Bird style.

I will also state that we should treat .lidr files as 'just' source files, here hidden code does not make sense. Thus, we should support markdown languages better. For instance, trying to write literate idris in CommonMark is not ideal, as we must assume that block quotes are code regions, and ignore actual source blocks. In these settings the notion of hidden code would make sense. For example, writing blog posts (or papers) in CommonMark (or Org Mode) in which one want's to hide unnecessary constructs.

...because it prevents some document processors (in my case Asciidoctor) to decide what to display and what to ignore.

The design of Text.Literate assumed that code regions were indicated by a line marker or paired code block markers i.e. representative of the markdown-like languages I have used. It did not occur to me to think how reST and asciidoc indicated code regions or that other languages did things wildly different: I was wrong. reST indicates code blocks with a leading marker but not end marker (well two EOL characters IIRC). This is similar to how asciidoc code paragraphs are indicated. Further, asciidoc supports 'listings; presented as code blocks indicated with start and end regions.

This is one area of improvement for Text.Literate but something that should not block it's inclusion, and would require a little bit of thinking.

One bug in the code I just found is that an empty literate line (i.e., just a '>' symbol) is rejected.

I would consider that a feature---the tokeniser expects line markers to be followed by a space. I will look at this later.

edwinb commented 4 years ago

@jfdm is this okay to merge or do you want to do more updates first? I have no opinions on the details, I'm just glad to see it being done! Ta!

jfdm commented 4 years ago

@edwinb merge away, I can fix things up later when I add the documentation.

jfdm commented 4 years ago

or give me until midday BST

jfdm commented 4 years ago

Done!

edwinb commented 4 years ago

Oops, sorry, I seem to have left this too long and some conflicts have happened and the CI has failed. Would you mind fixing them up, then I'll merge? Thanks.

jfdm commented 4 years ago

@edwinb Fixed.

edwinb commented 4 years ago

Great, thanks!

petithug commented 4 years ago

This patch breaks literate programming, please revert.

It requires a space after ">".
It does not enforce a blank line before and after a code block with bird mark.
It looks for bird marks in the middle of the document (?!)
It seems to be 3 times slower than the previous literate code (OTOH Idris2 pre-patch crashes with big files, so comparing is not reliable).

edwinb / Idris2-boot

Enhance literate mode to recognise multi-modes. #254