Better support for literate haskell

GoogleCodeExporter commented 8 years ago

It would be nice to be able to write literate haskell programs that could
easily be converted to various formats using pandoc.  It is currently
possible to write valid literate haskell in pandoc's markdown, using
delimited code blocks:

~~~~~ {.haskell}

> numbers = [1..]

But this approach has some problems. First, if the document contains regular blockquotes, they will be treated as haskell code by GHC. Second, it is inconvenient to have to repeat the tildes and the (optional) {.haskell} annotation. Third, blank spaces are needed before and after the bird-track sections. The current SVN version of pandoc collapses this blank space, but that's not such desirable behavior in general (what if one wants a code block with leading or trailing blank lines?) (The blank-space requirement can be turned off in ghc using the flags '-optL -q', but it is not ideal to expect people to know or remember this.)

Here is a proposal. Add a new command-line flag, '--lhs', to indicate that the input file is literate haskell. (This can be turned on automatically if the extension is '.lhs'.) When '--lhs' is specified and the input format is markdown, bird-track sections are treated as code blocks (with class attribute "haskell"), not as block quotes. Indented code blocks are still treated as code blocks.

When '--lhs' is specified and the input is latex, special lhs-specific constructions are recognized (e.g. \begin{code}...\end{code}).

'--lhs' could have an effect on output as well. Obviously, for markdown output it would output haskell code blocks in bird tracks. (What about HTML output? Probably bird tracks should be used there, too, so people could cut and paste from the HTML and have valid lhs.)


Original issue reported on code.google.com by `fiddloso...@gmail.com` on 7 Oct 2008 at 12:00

GoogleCodeExporter commented 8 years ago

New proposal:

add to parser state:   LiterateHaskellMode  :: Bool    -- default : False
add to writer options: LiterateHaskell      :: Bool    -- default : False

When parsing markdown, if
<!-- literate haskell --> or
<!-- start literate haskell --> or
<!-- begin literate haskell -->
is encountered in the source, turn on LiterateHaskellMode.  Turn it off if
<!-- stop literate haskell --> or
<!-- end literate haskell -->
is encountered.

When LiterateHaskellMode is on, bird-track sections in markdown will be treated 
as
code blocks with class "literatehaskell", not as quote blocks.

When parsing LaTeX, LiterateHaskellMode should activate proper handling of
\begin{code}..\end{code}, \ignore{}, etc.

When parsing HTML in LiterateHaskellMode, parse <pre> blocks with bird tracks 
as code
blocks with class "literatehaskell."

On the output side:

If LiterateHaskell is on, print bird-tracks in markdown code blocks with class
"literatehaskell" (and begin the document, after the title block, with <!-- 
literate
haskell -->).  Also print bird-tracks in HTML and use \begin{code}...\end{code} 
in LaTeX.

How do these options get set?

if input file extension is ".lhs", turn on LiterateHaskellMode.
if output file extension is ".lhs", turn on LiterateHaskell in writer options.

Also add command-line options '--lhs-in' and '--lhs-out'.  '--lhs' could be
equivalent to both '--lhs-in' and '--lhs-out'.

Note:  I've reverted the "ignore blank line at beginning of delimited code 
block"
feature mentioned in the last comment.  These changes should render that kludge
unnecessary.

Original comment by fiddloso...@gmail.com on 2 Nov 2008 at 4:59

GoogleCodeExporter commented 8 years ago

Correction: We should use class "haskell" instead of "literatehaskell", since 
the
bird tracks won't be explicitly included in the stored source.

Original comment by fiddloso...@gmail.com on 2 Nov 2008 at 5:01

GoogleCodeExporter commented 8 years ago

Original comment by fiddloso...@gmail.com on 2 Nov 2008 at 5:02

Added labels: Priority-High
Removed labels: Priority-Medium

GoogleCodeExporter commented 8 years ago

In parsing markdown with LiterateHaskellMode, only treat bird tracks as haskell 
when
the bird tracks appear in column 1.

Original comment by fiddloso...@gmail.com on 2 Nov 2008 at 5:30

GoogleCodeExporter commented 8 years ago

I'd like to add my support for this change. I can help with the implementation 
if
that's desired.

Original comment by JeanPhil...@gmail.com on 13 Nov 2008 at 5:43

GoogleCodeExporter commented 8 years ago

I've implemented most of this now (except for the inline comment pragmas, which 
I'm
now having doubts about).  But I've run into a serious problem that may block 
the
whole idea.  To see the problem:

$ cat test.lhs
# This is a test

> foo = reverse . words

$ ghci test.lhs
GHCi, version 6.10.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer ... linking ... done.
Loading package base ... linking ... done.

test.lhs:1:2: lexical error at character 'T'

Why does this happen???  Can anyone illuminate me here?

What this means is that standard atx style headers can't be used in
markdown-formatted literate haskell.  A partial solution would be to use setext 
style
headers, but we only have these for the first two header levels.  Also, I wonder
whether there are other problematic symbols besides '#'?

Original comment by fiddloso...@gmail.com on 30 Nov 2008 at 2:26

GoogleCodeExporter commented 8 years ago

Bottomline: I think this is a bug in ghc.

#! can be used to start comments in lhs files.
Indeed if you add a ! right after the # it is accepted.
I guess there is special handling for beginning of lies
that chokes on '# Text'.

I'm not sure the following is used for lhs, but here it is anyway:

http://darcs.haskell.org/ghc-6.10/ghc/compiler/parser/Lexer.x

(search for <bol>)

Original comment by JeanPhil...@gmail.com on 30 Nov 2008 at 9:01

GoogleCodeExporter commented 8 years ago

Mystery solved: Bertram Felgenhauer points out (on haskell-cafe):

> I believe this is an artifact of ghc trying to parse cpp style line
> number information:
> 
> >>> foo.lhs >>>
> # 123 "foo.foo"
> 
> > t = <>
> <<<
> 
> will print this error:
>    foo.foo:124:6: parse error on input `<>'

So, it's not a bug, but a feature we'll have to work around.

Proposal:  when writing markdown in --lhs-out mode, use setext style headers for
first and second level headers, and plain text for the rest.

Original comment by fiddloso...@gmail.com on 1 Dec 2008 at 10:36

GoogleCodeExporter commented 8 years ago

Support for literate haskell has been added as of r1512.
Instead of using --lhs-in and --lhs-out, I use +lhs suffixes on the input and 
output
formats.  So, for example:

pandoc --from markdown+lhs --to latex+lhs

I'm closing this issue, but the new functionality probably needs more testing.
Any problems/suggestions should be posted in a new issue report.

Original comment by fiddloso...@gmail.com on 2 Dec 2008 at 10:48

Changed state: Fixed

demydd / pandoc

Better support for literate haskell #89