giuspen / cherrytree

cherrytree
https://www.giuspen.net/cherrytree/
Other
3.36k stars 459 forks source link

Implement markdown parsing and formatting #858

Open ForeverRainbow opened 4 years ago

ForeverRainbow commented 4 years ago

Hi This is an attempt to provide somewhere for further discussion on the markdown parsing mentioned in #828 (and many other issues). It is something I would like as well as I am used to using markdown for formatting my notes and general writing.

Right now I think the implementation of markdown falls into two categories: Importing/exporting, and formatting. The import/export category is pretty self-explanatory - add the ability to import and export markdown files to cherrytree. The formatting catagory is more complex, if cherrytree does get the ability to interpret markdown (for example to write _italics_), then how much of the "standard" should it support and what (if any) conflicts would it cause with the existing shorthand language?

Issue links:

218

txe commented 4 years ago

Yeah, actually, I wanted to asked it too)

  1. which standard to support? should we support several standards?
  2. how much. I don't use markdown, so you can come up with anything reasonable. And we can outline which parts don't make sense in cherrytree e.g. lists.
  3. About conflicts. I don't see any so far. And definitely there will be an option to turn off markdown.

So based on github markdown there are

Headers - yes (one guy asked about it long time ago) Emphasis - yes Lists - no Links - yes Images - yes Code and Syntax Highlighting - yes Tables - no Blockquotes - no Inline HTML - no Horizontal Rule - no Line Breaks - no YouTube Videos - yes (everyone likes videos with cats 🐱)

txe commented 4 years ago

Other issues

That's why I want to see editors with real-time markdown support, discord is quite good example, github is not because it has a viewer.

ForeverRainbow commented 4 years ago

Escaping markdown is relatively simple - just check if there is a \ before the symbol. Most editors I know about don't do real-time parsing of the markdown, since the person writing it can generally understand what it means and its very simple (at least thats my guess). Possibly in the configuration there might be an option to have it always trigger when it detect a valid expression (like header text), never trigger, or trigger on some action (e.g hitting enter at end of line)

txe commented 4 years ago

Escaping markdown is relatively simple - just check if there is a \ before the symbol

Yeah, but user can complain about adding 'unnecessary' \ :) we can just support CtrlZ, so it easy to revert the happened change without additional symbols. And keep checking markdown only in the current line, so it wouldn't be tiresome for users to press CtrlZ again and again.

It looks like everything is clear. something else to discuss?

ForeverRainbow commented 4 years ago

I think there should at least be the option to escape it with \, personally I would complain about the 'unnecessary' ctrl-z xD.

I should be able to implement a basic markdown parser by extending the existing CtImportHandler and possibly the Zim one (_tokonize is pretty generic for all basic language parsing), only issue is likely to be multiple styles at once

txe commented 4 years ago

Is it hard to improve parser?

ForeverRainbow commented 4 years ago

Multiple styles at once shouldn't be too hard, its more about closing the tag when non-formatted text is added than anything else but it should be fixable with some state flags

ForeverRainbow commented 4 years ago

@txe Currently in the process of implementing this, got something weird though. When importing a node, the icon is the vs code icon instead of the cherry. Do you have any idea why this could be happening? Ok I fixed this, turns out I forgot to set the formatting to rich text. I am assuming this is an issue with selecting icons for a node without any "language" option

txe commented 4 years ago

CtTreeStore::_get_node_icon

ForeverRainbow commented 4 years ago

I have a working importer for Markdown, should I make a pull request? I am concerned that it adds features to the C++ port which do not exist in the python version.

txe commented 4 years ago

Hi, sure, you can do it

DiagonalArg commented 4 years ago

Is it possible to just use pandoc, the "general markup converter," for the converstion? [1][2] That covers a number of markdown flavors, including pandoc's own, Strict, CommonMark, GitHub Flavored Markdown (GFM), MultiMarkdown (MMD) and Markdown Extra (PHP Extra) [1]

From the man page:

Pandoc's enhanced version of Markdown includes syntax for footnotes, tables, flexible ordered lists, definition lists, fenced code blocks, superscripts and subscripts, strikeout, metadata blocks, automatic tables of contents, embedded LaTeX math, citations, and Markdown inside HTML block elements. (These enhancements ... can be disabled using the markdown_strict input or out‐put format.)

And, for those of us who do mathematics, please don't miss LaTeX. It's covered by pandoc. Or, you could also look at libraries, which include MathJax [1][2] and KaTex [1][2]. If choosing a library, I gather the latter may be a better choice.

DiagonalArg commented 4 years ago

This might be a feature request, but if there were a scripting interface (is there one?), then we might do something like pipe the raw markdown to pandoc like so:

| pandoc -s -f rst -t context -o my.pdf

while running in the background this inotify like utility that will reload the pdf once it changes:

entr -r mupdf my.pdf

Of course that doesn't have to be pdf. It could be epub, odt, or anything else pandoc can do.

ForeverRainbow commented 4 years ago

@DiagonalArg I admit I zero experience with pandoc, the problem I can see at the outset is that we need to be able to convert it to the XML which cherrytree uses for rich text, as far as I can see using pandoc the closest we could get it converting to HTML and then running it through the HTML importer.

Edit Looking at the documentation it seems relatively easy add custom writers, except... its written in Haskell, I actually recently started learning Haskell but I cannot pretend to be able to write a full application in it. Additionally even if it would be able to output the correct XML it would be a separate binary which cherrytree would have to invoke which could cause nightmares issues for cross-platform building (or call Haskell from C++ which is possible but isnt really supported afaik).

DiagonalArg commented 4 years ago

@ForeverRainbow - Try this and tell me what you think.

(Just seeing your edit. (1) according to [1], pandoc runs in both Unix like systems, which would include OSX (it's available in Homebrew), and Windows. (2) Pandoc has plugins for conversions to as-yet-not-supported formats, and the language for those is Lua, not Haskell. (3) Since it's a plugin system, you wouldn't have to compile pandoc for Cherrytree users, you would just have to offer the pandoc plugin with Cherrytree. Pandoc would be a dependency - but it would only be necessary if people want certain types of conversions.)

First, note that various other projects supply interfaces between their project and pandoc. Here, for example, is Sublime's. Here is another for Atom. There are many more, as this is a well known and heavily used tool. It's the swiss army knife of document converstion. "Pandoc is a free and open-source document converter, widely used as a writing tool (especially by scholars) and as a basis for publishing workflows. It was created by John MacFarlane, a philosophy professor at the University of California, Berkeley." [1]

Second, note that pandoc is Free software (GPL v.2), which uses an intermediate format between all formats it supports, and that "Plug-ins for custom formats can also be written in Lua..." [1] That means, you can add your own (from or to) Cherrytree XML converter.

That would give you a pandoc plugin for Cherrytree which would allow enormous breadth of output formats. You would no longer have to code your own, piecemeal. It would also solve the problem which is dear to my heart, of being able to use some version of Markdown in combination with LaTeX, which pandoc supports.

[1]

ForeverRainbow commented 4 years ago

@DiagonalArg I do like the idea of having pandoc support, maybe as an extra "enable additional formatting options (requires pandoc to be installed on your system)". It would take significant looking into though I think but I agree it would be much better than trying to roll our own formatters. Maybe open a seperate issue for it.

A Lua filter looks promising, since all that would be needed is a lua file in cherrytree's source somewhere. What I meant about a seperate binary was supposing a seperate haskell program was needed to implement cherrytree's custom formatting.

ForeverRainbow commented 4 years ago

@txe So currently the built-in markdown parser in cherrytree supports very basic markdown. If we want to go further I think we are going to need a tokenizer library. I am wondering whether it is worth it to go further considering that pandoc already supports markdown.

I would however like to be able to use markdown live formatting in cherrytree documents which isnt going to be possible with pandoc afaik.

A tokenizer library would also ease the implementations of other importers although do any ones apart from Zim use non-xml files?

giuspen commented 4 years ago

@ForeverRainbow can you better describe this tokeniser what functionality has to provide? Could you write yourself some code that does that or it is really complex so you definitely need an external library?

txe commented 4 years ago

There are import formats which are not xml based. On the other hand, using the old algorithms for them maybe the best way to deal with them. They are already tested and need less time to implement.

Cherrytree itself supports only a small amount of markdown styles, so the fact that build-in parser is basic is understandable.

ForeverRainbow commented 4 years ago

Yep I do not think the markdown needs to be incredibly complex I agree.

@giuspen The tokenizer is mostly about being able to reliably tokenise. The current tokenizer I am using has a few problems (most notibly its not greedy) which I am actually fixing right now.

The advantage of a tokenizer library is that it is going to be more efficent and handler corner cases better. The "really complex" part is handling the corner cases and weirdness which happens when humans write things, which shouldnt be an issue with the current markdown but probably would be with things like tables thrown in (lists within tables anyone?)

giuspen commented 4 years ago

@ForeverRainbow I would ask to give a chance to the glib utility https://developer.gnome.org/glib/stable/glib-Lexical-Scanner.html first with an open mind and if really doesn't serve the purpose we can try to convince @txe to use boost

ForeverRainbow commented 4 years ago

@giuspen Thanks for that link! I shall definitely look at it and I really should start checking whether glib does what I want already for things... xD

giuspen commented 4 years ago

Thanks @ForeverRainbow ! 😉

ForeverRainbow commented 4 years ago

Update on the status of this #917 has (I think) implemented all the features I think cherrytree markdown needs, it does not support quotations because afaik there is no equivalent formatting within cherrytree.

DiagonalArg commented 4 years ago

@ForeverRainbow Wouldn't a quotation just be a block indent?

ForeverRainbow commented 4 years ago

@DiagonalArg That would be one way of doing it I guess, but since cherrytree doesn't have any way of saying thats a quote its ambiguous as to why there are these lines indented, better imho to just leave them as they are

SadE54 commented 4 years ago

Is is working in 0.99.2 ? I just tried to create a rich node and write markdown tags, nothing is rendered :-/

ForeverRainbow commented 4 years ago

@SadE54 Apologizes but the markdown formatting is broken in that version, it was fixed in #954 and I am currently improving its design (see comments by @txe there). Markdown importing does work however

ForeverRainbow commented 4 years ago

@txe To continue our conversation from #954, I actaully realised its very easy - just calcuate the size of the tags and change the text between them. Questioning whether I should have it so that the tags are left and just the text in between them changes (like a preview like on ms teams, discord, etc) or remove the tags as soon as formatting is applied (so what we have right now except format on valid tags rather than on enter).

txe commented 4 years ago

If applying instantly looks natural, then it can be done

ForeverRainbow commented 4 years ago

I am going to mess around with it a bit and see, I have it just replacing the text between right now and it does cause quite a bit of "noise" in the text.

Alfystar commented 1 year ago

Hi everyone! 😄 I'm trying to enable Markdown real-time render but I don't understand how enable it over one node... Could someone help me? I'm currently on 0.99.53 version, Windows 10

giuspen commented 1 year ago

@Alfystar the implementation was causing crashes so I disabled it and haven't looked back at it yet, it will happen sometimes in the future.

Alfystar commented 1 year ago

😟 I understand, but I really hope that the support can return to its full potential. A few years have passed, many projects have been born, including open source ones, to make notes in markdown, maybe you can take the renders from them...?

giuspen commented 1 year ago

What the auto markup replacement was doing was recognising, as you type, some patterns and replacing with the cherrytree equivalent. Cherrytree is doing something similar with the symbols auto replacements documented in the dialog below (from tab special characters of preferences dialog): image If you create a list with the most important/common/standard auto replacements that you expect (in order of importance), I can look forward to at least gradually support those. Every replacement must have specified the expected cherrytree applied tag.

Alfystar commented 1 year ago

Thank you so much for your opening! I will make the list as soon as I have a moment to do the work.

I think that: The final goal for these features, would get to have a switch between markdown and render text, so you can use the notes written on CherryTree, once ready, on other pages such as GitHub Readme.md

giuspen commented 1 year ago

The easier way to have a markdown output is to implement, in parallel to the auto replacement, an export to markdown that supports the tags that you are going to document

Sinkmanu commented 9 months ago

Hi @giuspen, I have a question about the development of the "markdown auto replacement", sorry if it is not the correct issue where post it. I have been reviewing the source code in order to see the where the markdown auto replacement is, e.g. listings with asterisk (*) and minus (-), and I have found the preprocessor "MD_AUTO_REPLACEMENT". I have defined "MD_AUTO_REPLACEMENT" in config but the compilation gets errors. Is there under development? Or there is a working version?

I have made some modifications in the sources to compile (the errors was undeclared functions) and now I see the "Enable Markdown Auto Replacement (Experimental) in Preferences dialog and apparently it works. (I have verified some markdown tags)

Debug output:


[2023-12-07 10:50:06.768] [   ] [debug] Creating new tag: md-formatting-0
[2023-12-07 10:50:07.003] [   ] [debug] Applying tag: md-formatting-0
[2023-12-07 10:50:07.544] [   ] [debug] Applying tag: md-formatting-0
[2023-12-07 10:50:07.544] [   ] [debug] TOKEN MATCHER: FOUND OPEN
[2023-12-07 10:50:07.735] [   ] [debug] Applying tag: md-formatting-0
[2023-12-07 10:50:07.881] [   ] [debug] Applying tag: md-formatting-0
[2023-12-07 10:50:08.050] [   ] [debug] Applying tag: md-formatting-0
[2023-12-07 10:50:08.128] [   ] [debug] Applying tag: md-formatting-0
[2023-12-07 10:50:08.196] [   ] [debug] Applying tag: md-formatting-0
[2023-12-07 10:50:08.353] [   ] [debug] Applying tag: md-formatting-0
[2023-12-07 10:50:08.353] [   ] [debug] Finished, open: <# > contents: <Foobar> close: <
> 
[2023-12-07 10:50:08.353] [   ] [debug] TOKEN: # 
[2023-12-07 10:50:08.353] [   ] [debug] TOKEN: ```
giuspen commented 9 months ago

Hi @Sinkmanu the reason why I macroed that out is that it was causing crashes, it was not my code and I didn't have time to debug it. You can play with that, personally for now I'm not planning to work on it.