200ok-ch / org-parser

org-parser is a parser for the Org mode markup language for Emacs.
GNU Affero General Public License v3.0
316 stars 15 forks source link

Parser will not parse styles within sections. #26

Closed wildwestrom closed 3 years ago

wildwestrom commented 3 years ago

Describe the bug Parser will not parse styles within sections.

To Reproduce

  1. example
    * hello world
    this is some text *plus bold text*.

    Pass that into:

    #(org-parser.parser/org %)
  2. result
    [:S
    [:head-line [:stars "*"] [:title "hello" "world"]]
    [:content-line "this is some text *plus bold text*."]]

Expected behavior Something like this:

[:S
 [:head-line [:stars "*"] [:title "hello" "world"]]
 [:content-line "this is some text"
  [:text-styled
   [:text-sty-bold
    [:text
     [:text-normal "plus bold text"]]]]
  "."]]

Is it supposed to work like this? How do we want the final tree to look in the end?

schoettl commented 3 years ago

Parsing just the second line works: https://github.com/200ok-ch/org-parser/blob/master/test/org_parser/parser_test.cljc#L628

Oh, here is the problem: https://github.com/200ok-ch/org-parser/blob/master/resources/org.ebnf#L14

Parsing lines as text is disabled. I don't know why at the moment...

schoettl commented 3 years ago

You can check if it works for you when you change that lines in org.ebnf.

But for a PR, some tests need to be updated. These tests still expect :content-line as parse result and nothing more.

wildwestrom commented 3 years ago

Ok, I'm not familiar with how instaparse works at all. I can see that the ebnf file is full of clojure data structures that get passed into instaparse.core/parser, but I have no idea what that little regex bit should be. Once I figure that out I can rewrite the tests and make sure everything works.

schoettl commented 3 years ago

This is a good tutorial for instaparse: http://xahlee.info/clojure/clojure_instaparse.html

The ebnf file just defines the orgmode language in EBNF grammar. It has nothing to do with Clojure.

content-line = #".*"
(* content-line = text *)

This is EBNF syntax. The first line just parses a line as a string without any markup. That line would have to be deleted. The second line is a comment but you would have to uncomment it to parse text including markup/styles.

The parser_test.cljc defines tests for the org parser.

wildwestrom commented 3 years ago

Looks like three tests in total failed. Here's one of them:

Fail in drawer with a bit of content

expected: [:S [:drawer-begin-line [:drawer-name PROPERTIES]] [:content-line :foo: bar] [:drawer-end-line]]

  actual: [:S          
           [:drawer-begin-line [:drawer-name "PROPERTIES"]]
           [:content-line [:text [:text-normal ":foo: bar\n:END:"]]]]
    diff: - [nil nil [nil ":foo: bar"] [:drawer-end-line]]          
          + [nil nil [nil [:text [:text-normal ":foo: bar\n:END:"]]]]            

The other three failing tests are very similar. Here is the only change so far.

 empty-line = "" | #"\s+"
 (* TODO same as title text below. *)
-content-line = #".*"
-(* content-line = text *)
+content-line = text

 <eol> = <#'\n|$'>
 (* TODO remove <> to enable use where the spaces are actual needed *)
...
schoettl commented 3 years ago

Thanks for the try... well seems that not only tests have to be fixed. The parsing of drawer doesn't work yet. I'm working on them right now in one of the open PRs.

schoettl commented 3 years ago

Basic, non-nested styled text should no be supported.

Advanced styled text is hard to parse and not yet supported, see #12. The plan is, to reuse Emacs org-mode's regexes for recognizing styles (and not try to encode them in EBNF).