athensresearch / athens

Athens is no longer maintainted. Athens was an open-source, collaborative knowledge graph, backed by YC W21
https://athensresearch.github.io/athens
Other
6.31k stars 398 forks source link

Parser #94

Closed tangjeff0 closed 3 years ago

tangjeff0 commented 4 years ago

This feature is for parsing block-level syntax – e.g. • here is [[a link]] – and rendering it to the correct HTML via Hiccup.

We use the instaparse library.

This file may be useful for nested links: https://github.com/athensresearch/athens/blob/master/data/nested-link.hiccup

5

25

44

roryokane commented 4 years ago

My current knowledge of what needs to be done:

Exploratory tasks

Straightforward tasks

Implement known syntaxes

I think these tasks should not be attempted until most of the exploratory tasks are done, because some of those exploratory tasks would require writing new code for every rule, and those tasks would be more tedious if more rules had to be accounted for at that time.

For each of these syntaxes:

  1. Research Roam’s handling of edge cases by writing sample text in Roam
  2. Decide on Athen’s desired handling, possibly taking cues from other markup syntaxes like CommonMark
  3. Implement the parsing and initial transformation in parser.cljc. Remember to update any other parser rules that allow the new syntax to be nested inside it.
  4. Write tests for that rule in parser_test.cljc.
  5. Implement rendering of that markup as Hiccup in parse_renderer.cljs.
tangjeff0 commented 4 years ago

Thanks for collecting all these thoughts together @roryokane.

Before I respond to your comments, my first question is: what are your thoughts on putting tests cases or examples in Devcards?

We need not actually do unit tests, and of course, the parser should still be cljc, but it would be easier for me to reason about example cases if I could visually identify things like markup and links in addition to the string inputs and hiccup outputs.

In general, I'm having some difficulty making mental models for these parsing problems, especially for the two ideas regarding nested links: remembering vs reconstructing source text. I believe your explanations are good, but parsing is just kind of an abstract subject to me. End-user output would be more user-friendly, and I think it would encourage others to start thinking about these tricky parsing problems!

Lastly, I think ultimately the designers will need to design things like nested and bolded links, bolded and highlighted text, and tranclusions of nested, bolded, highlighted, italicized text! It might be useful to use Devcards in the development of the parser, not just at the end 😄


I see that Roam is missing a feature regarding this: it doesn’t highlight which link you will click when you hover. Can that be implemented with the Hiccup output above, or will we need a different structure?

In Roam's case, the inner link is contained within a link class. Inner links can thus be distinguished by custom CSS. So if we have a similar DOM and CSS class setup, our designers can handle this UX.

Can't distinguish with default CSS Screen Shot 2020-06-08 at 3 14 23 PM

But with custom CSS you can easily detect the inner link Screen Shot 2020-06-08 at 3 16 02 PM


should we implement Roam-compatible syntax now and write a converter later, or are there few enough Roam users that we can start with Markdown-compatible syntax?

We can always worry about converting and accreting later. Basically using the same logic you pose here:

I think these tasks should not be attempted until most of the exploratory tasks are done, because some of those exploratory tasks would require writing new code for every rule, and those tasks would be more tedious if more rules had to be accounted for at that time.


In Markdown, the text on the same line as the first ``` is a language identifier, but in Roam, it’s the first line of the code. Implement Roam syntax to start with since it’s slightly easier, but what’s our long-term plan?

Short-term, maybe all we do is syntax highlighting Roam or Markdown style.

Long-term, I know I would like a REPL experience, where Athens replaces Emacs/VS Code and we can run arbitrary code in code blocks. We already have sci and datascript boxes :). But this is kind of v2 stuff.


header:: syntax

This is easy to implement on the parser side, but not on the Datascript, so maybe not a huge priority.

tangjeff0 commented 4 years ago

Added basic devcards in #151, viewable at https://athensresearch.github.io/athens/cards.html#!/athens.devcards.parser

roryokane commented 4 years ago

I found this page that attempts to document Roam’s syntax: roam-tricks – Roam Shortcodes. Some of the syntaxes described there I hadn’t been aware of:

I also learned of RoamTutorial – Secret & Advanced Features, which doesn’t reveal any syntaxes the other page doesn’t also include, but does go into more detail on how they are supposed to be rendered.

tangjeff0 commented 4 years ago

Alias is pretty cool. {{[[TODO]]}} and {{[[DONE]]}} are pretty important. :hiccup we can probably support quite easily with https://athensresearch.github.io/athens/cards.html#!/athens.devcards.sci_boxes by @tomisme !

tangjeff0 commented 4 years ago

Writing this so I don't forget: recursive block refs lead to stack overflow and app freezes.

tangjeff0 commented 4 years ago

We also want to auto-create pages with # and #[[]] @HaojiXu

thesophiaxu commented 4 years ago

Related: https://github.com/athensresearch/athens/issues/44#issuecomment-636139109

tangjeff0 commented 4 years ago

Also are pages created are inner links auto-created? i.e. [[nested [[links]]]] should created [[nested [[links]]]] as well as [[links]].

thesophiaxu commented 4 years ago

Also are pages created are inner links auto-created? i.e. [[nested [[links]]]] should created [[nested [[links]]]] as well as [[links]].

@tangjeff0 Yes, they are auto-created in earlier commit.