Open tomcl opened 8 years ago
Just an update:
I'm writing some code which I will at some time integrate with the main parser as a "optionally switch it in" pass so that anyone not using it will not be worried by potential syntax changes. I need to do a fix for parsing in table cells which at the moment works out column delimiters before parsing inline code etc, and so is broken by '|' characters.
The current strategy is to have a "core" markdown parser (FSharp.Markdown) representing common-mark (well we don't fully conform to the spec at the moment, but that's the idea).
Other features are actually build on top (like handling F# snippets) by walking or extending the "default" tree (FSharp.Literate defines it's own tree by re-using some parts). So if those features can be implemented by transforming an existing tree than this is definitely the way to go (and perfect for composing things). If you encounter problems with this approach feel free to start discussing details as extending the tree is definitely possible.
Thanks @matthid for the answer - I agree with everything!
I think for additional features that cannot be implemented by transforming the parsed Markdown, it would make sense to add them to the core and perhaps enable/disable them with some switch.
But I guess that even with footnotes, you can come up with a syntax that will not require this. For example, you could write them as:
Some text[!footnote](Blah blah blah) some more text
...which will parse as a link, but you can then look for links with a special !footnote
text.
Oh, and just for the record, it is amazing to hear that you'll be teaching F# at your university :-).
Any chance your materials will be publicly available? I'm sure many people in the community would be very interested. You could send a PR to add them to the F# Foundation site: http://fsharp.org/teaching/index.html
Hi Tomas,
Thanks for this.
We will see how it goes. The target audience is a bit atypical, they have learnt C/C++ procedural and done a very basic data structures and algorithms course, but have no intro to OOP. They are electrical engineers and hence good mathematicians and (Imperial College) pretty bright, but have not done much programming - just two modules on our course.
It was a narrow decision between F# and Haskell for this purpose - the actual module is called "High Level Programming" and after an overview of FP and OOP I teach a very largely pure functional subset of F#. I don't see the point in teaching OOP because those who want it can pick it up vocationally. The difficult bits of OOP like generic contravariance/covariance get covered anyway in F# with the dreaded 'value restriction'!
For me, the key concepts I want to get across (in lectures backed by a lot of practical work) are how rich static type systems with inference can make programs better, and how immutability is worthwhile even when it seems not to be. The way that types get simpler in an immutable world is one example of this. These are deep and interesting concepts. I'll cover OOP by noting the expression problem, and how neither OOP nor FP have a decent solution. I'll cover state monads using C.E. syntax. So there is enough complexity for a 3rd year EE module. At the same time I'll have them doing as much programming as possible with individual work followed by a large group project, and teach things like testing, modularity, etc.
F# wins over Haskell because of good tooling and nice syntax, and (possibly) real world relevance. I'm hoping the complexity of .NET interoperability, and slight hackiness of F#, can be pushed to one side since for my target audience this is not relevant when they do the course. The weakness of F# type system will not matter in practical terms for them.
I've a question for you. Do you really think the F# one-pass compiler strategy with its definition order-dependence within a module is good? I don't find it particularly unpleasant, and it allows better type inference as you type, but it is an unusual design decision. (The choice to disallow cycles across modules is another matter and can more easily be justified as a genuine good).
Best wishes, Tom
PS - I've still got 12 months to go so materials are all very preliminary at the moment - opening these is possible though I'd need to think about it.
Thanks for all the information - it is great news :+1:
This sounds like a really important lecture! I think there is a lot of people doing science who learn C/C++ and then use it for all their projects without being aware of alternatives. F# should give them a good idea that there are other options (no matter if they end up using F#, Haskell or something completely different later).
As for the one-pass strategy - I think the requirement of no cycles is mostly a good thing. People in the F# community learned to like it and there are good reasons - here is a nice write-up. I think it is a good default, though having an "escape option" to be used rarely might be useful.
As for the ordering of files itself, I do not really find it a big issue, but for very large projects (like the F# compiler), it would be nice if you did not need it (and the compiler would figure out the order on its own or from some directives). Though in all projects that I significantly contributed to, getting the file ordering right was not such a big deal, so I don't feel strongly about it.
OK - so my comments on the best way to add extensions. I propose:
Use a subset of direct and indirect link syntax in markdown:
body [body][indirect-link-name]
Here, currently, both body and the link are parsed using bracket matching with a '\' standard markdown escape. This is not quite right - because inline code or math should escape the escape. For current usage that is not an issue (though undesirable - a math expression could be displayed with a hyperlink).
So the direct link parser needs to be upgraded a little - easy to do and this is desirable anyway. I've just done something similar to make tables work properly
The links that are escaped to become extension syntax would be: ^!.... - anywhere when parsing plain text
There is some possibility of change in spec that affects existing markdown because in principle any character could be the start of a displayed hyperlink but this seems seems pretty safe. Maybe someone has a better combination? Maybe people think a single character (say ^) would be safe enough? If the extensions are not enabled by default this issue is not a problem and a single character would be fine.
The existing code will then generate a parse tree that can be transformed by the extension code _before _ it is converted to html or latex.
There is an issue about extensions that cannot be expressed as simple markdown.
Specifically, one extension is to allow numbered paragraphs with numbers generated in order throughout the document - suitable for sequential numbered questions etc. Pandoc does this. The problem is that unextended markdown has no way to denote a single numbered list item with arbitrary number. So to implement this we would need a true extension to the parse tree that is recognised by latex and html output processors. The output processor changes would be relatively small, since it is a change to existing list definitions easily possible in both html and latex.
My strategy would be to work on these extensions myself in a fork with a view to merging it back in when they are properly done. Since the extensions are not 100% commonmark compatible (however you define escapes they might be present in a real document) the extensions could be enabled by a processor switch.
Comments?
The first part sounds good to me!
As for the numbered paragraphs - can you point me to some example of how this works in Pandoc? I don't think I've seen these before...
(@) first question
other stuff....
(@) second question
other stuff...
(@) third question The (@) paragraphs form an ascending numbered list
The use case is when you have a worksheet with interleaved examples and questions.
I see!
This looks like something that could still be done by running the original parser and then looking for paragraphs that start with (@)
. You would just need to add your own MarkdownEmbedParagraphs
to the AST which represents a block of paragraphs with a number....
Yes, the parsing is no problem.
The issue is that the output (a numbered list item with a specified number) is different from anything the current writers can generate. A numbered paragraph would be easy, but a numbered indented paragraph with hanging indent for the number is not possible.
So this is the most difficult extension to implement.
Just a followup. I've worked out that everything I want can be implemented using HTML tags - easily. So I've added a custom macro expansion outside of fsharp.formatting to generate the correct tags and all is good without changing the formatting code. That seems the cleanest route for anything complex.
However - in doing this I've found that the markdown in-paragraph formatting breaks (is not processed) after an initial inserted html tag unless separated by a new markdown paragraph. This is actually inconvenient because of the unwanted generated
.
is generated.
Generally speaking, we should try to do what CommonMark does. I played a bit with this in the playground and it seems to be doing something clever. Can you look at CommonMark and figure out what their specification around this is?
OK - so in fact what happens now for HTML blocks is roughly as in CommonMark. Block characters are escaped until the block ends - which is at the next markdown paragraph boundary.
My problem with the CommonMark spec is that I'm using CSS counters in a list to make numbered paragraphs, and emitting the number with CSS content styling before the list element.
<ul class="question"> <li class="question">
content of list element
</ul> </li>
I can work round this by making the pre-processing insert the question number and not using CSS counters for this purpose.
The only solution I can see here would be an escape as I suggested, which would be harmless, but maybe better I give up trying to use CSS styling.
One issue is whether it is worth a special escape to make CSS styling as above with content work properly?
Another issue is whether it is worth an escape to make hand HTML annotations more readable?
<tag> some stuff I want to pick up HTML style and be processed as markdown </tag>
must be written:
<tag>
some stuff I want to pick up HTML style and be processed as markdown </tag>
an escape would allow:
<tag> <<>> some stuff I want to pick up HTML style and be processed as markdown </tag>
One issue about documentation for newbies running windows and not used to command line.
If I download the repo to a clean system with F# installed it would be good to have simple instructions on how to build for newbies. It is I think:
Start cmd from start menu:
> cd <root directory of fsharp.formatting>
> build clean
> build
I'm not certain whether there are any other prerequisites not downloaded automatically by the build process. Anyway it would be good to document this. I know it seems obvious - but in windows double-clicking build will not work as one might hope.
I've just started using Fsharp.formatting to process markdown files defining a set of tutorial worksheets for a new F# lecture course at my university next year. It is great having the compiler-generated tooltips and coloring. I have enough time to get the infrastructure right, so I'm wondering how best to do the other things I need.
This issue is because I don't want to reimplement the wheel and would be happy to add to the core code if that was merited (though I doubt it will be). I guess this is a meta-discussion about how to compose non-standard features.
Previously (another language) I was using Pandoc markdown and transforming this to PDF and HTML with Pandoc, where the HTML output was important, and PDF an added bonus but maybe not perfectly formatted.
The features I need and (I think) have not got are:
So my questions are for these two features:
Best wishes, Tom