PoiScript / orgize

A Rust library for parsing org-mode files.
https://poiscript.github.io/orgize/
MIT License
278 stars 34 forks source link

Exposing text position of each element #7

Closed calmofthestorm closed 7 months ago

calmofthestorm commented 4 years ago

One thing that would be useful for a project I'm working on would be the ability to get the exact start and end of a given element in the original document text. I currently only need this for headlines, but could see other uses depending on how difficult it is to implement.

At the moment, I use my own very simple parser that breaks an org document into a tree of headlines and then use orgize to parse each headline as necessary. Another advantage of this is that it guarantees that parsing and then exporting a document produces an identical file, which is a useful feature for my use case.

I was mostly curious if you had any thoughts on how hard these would be to implement in orgize and whether you had suggestions on how to do so. My current solution is basically functional but cumbersome and frustrating, and also requires multiple storage or frequent reparsing (pick your poison) of text, so I'm considering alternatives and one possibility would be to upstream the features I need.

PoiScript commented 4 years ago

It's definitely possible to include text position formation. We can warp the &str with a Context struct which behaves like &str and also contains the position in the input org string. I'll try to implement this in the next few days.

calmofthestorm commented 4 years ago

Awesome. Now that I think about it, I believe this would also be sufficient to address my other concern, which is identical output text for any headline which has not been modified, by emitting the original text if the node is unchanged, and only generating the output from the parsed node if it has been changed. I assume this would need to be a custom formatter, but I think it would be easy enough for a user to write.

My main motivation there is minimizing the diff hell that is using an org repo in git across machines (as well as to limit the blast radius of any bugs or not-bugs-but-not-intended-behaviors due to mistaken human edits). Avoiding diffs is key. Another option would be to reformat everything every time on save, I suppose.

PoiScript commented 7 months ago

Staring for v0.10, each parsed element will has both begin() and end() method which return the position of element in original document.