200ok-ch / org-parser

org-parser is a parser for the Org mode markup language for Emacs.
GNU Affero General Public License v3.0
316 stars 15 forks source link

Parsing not always round-tripable #67

Open sh54 opened 1 year ago

sh54 commented 1 year ago

Assumptions

This assumes that a goal of the library is for a test like this to always pass for any given org:

(let [org "* Headline"]
    (is (= org (write-str (read-str org)))))

The bug

Parsing certain orgmode documents drops information about spacing resulting in a data structure that when written will not be the same as the original document.

To Reproduce

Note the use of extra spacing:

* TODO [#A] Buy raspberries :purchase:

Result from parse:

[:S
 [:headline
  [:stars "*"]
  [:keyword "TODO"]
  [:priority "A"]
  [:text [:text-normal "Buy raspberries   :purchase:   "]]]]

Only extra spacing around tags is still preserved.

Result from (comp transform parse):

{:headlines
 [{:headline
   {:level 1,
    :title [[:text-normal "Buy raspberries"]],
    :planning [],
    :tags ["purchase"]}}]}

Now spacing around tags is lost.

Expected behavior

So the extra spaces between the stars, keyword, priority and title surely violate most people's style guide but seem to be perfectly valid org. Extra spacing before the tags allows for right aligning them. Extraneous spacing should be removed by some formatting pass instead.

I would feel that extra spaces should be preserved ideally in a way that does not make it much harder to manipulate the AST.

Suggested parse structure

[:S
 [:headline
  [:stars "*" [:s "   "]]
  [:keyword "TODO" [:s "   "]]
  [:priority "A" [:s "   "]]
  [:text [:text-normal "Buy raspberries   :purchase:   "]]]]

Suggested transform structure

{:headlines
 [{:headline
   {:level 1,
    :level-post-spacing "   "
    :keyword "TODO" ;; keyword not present is another issue
    :keyword-post-spacing "   "
    :priority "A" ;; priority not present is another issue
    :priority-post-spacing "   "
    :title [[:text-normal "Buy raspberries"]],
    :title-post-spacing "   "
    :planning [],
    :tags ["purchase"]
    :tags-post-spacing "   "}}]}

The original document can be reproduced from either structure. If someone wants to fix the formatting by manipulating the data structure that should be easy too.

schoettl commented 1 year ago

I would say, org-parser's aim is not a 100% round-trip accuracy. In sum, and for all edge cases that would be a huge effort, I think.

Currently (write-str headline) renders the headline as org-mode would do it when you press the keys for next TODO state or next priority in emacs. I'm not sure if we should follow the suggested path because it makes everything more complicated and harder to maintain.