PoiScript / orgize

A Rust library for parsing org-mode files.
https://poiscript.github.io/orgize/
MIT License
277 stars 34 forks source link

Issue in Start Events #75

Closed shoehn closed 3 months ago

shoehn commented 3 months ago

Hello, I am trying to parse a custom DSL based on org-mode using the library and I am confused about how events work. My document is a simple org-mode file, a minimal example:

* Challenge

Here is a short description that I need to capture.

It might have several paragraphs.

** There are some Hints

Which also have paragraphs

** And they can be as many as needed

With one

or more paragraphs.

When I am in a start event for a title I can already read the text (in the raw element). That should not be the case in my understanding. When a title starts it does not have the text parsed, that is only when it is finish.

The problem is, that I need to collect the text from the paragraphs into an accumulator and write it out if a new title starts. But the library has already sent a start and end event for the text in the title and I do not see a way to recognise the point where a new title starts (and then get the collected text i.e. the two lines in the minimal example above) to correctly handle text that is not a title.

What is the idea and the way in orgize to handle such a basic use case?

Thanks and regards Sebastian

PoiScript commented 3 months ago

That should not be the case in my understanding. When a title starts it does not have the text parsed, that is only when it is finish.

The parsing is done after calling Org::parse, so the Event actually means "traverse event" not "parse event".

The problem is, that I need to collect the text from the paragraphs into an accumulator and write it out if a new title starts.

I think you can just extract the whole section, instead of collecting it from separate paragraphs:

    let mut paragraphs: Vec<String> = vec![];

    let org = Org::parse(
        r#"* Challenge

Here is a short description that I need to capture.

It might have several paragraphs.

** There are some Hints

Which also have paragraphs

** And they can be as many as needed

With one

or more paragraphs."#,
    );

    org.traverse(&mut orgize::export::from_fn(|event| {
        if let Event::Enter(Container::Section(section)) = event {
            paragraphs.push(section.syntax().to_string())
        }
    }));

    assert_eq!(
        paragraphs[0],
        "\nHere is a short description that I need to capture.\n\nIt might have several paragraphs.\n\n"
    );

    assert_eq!(paragraphs[1], "\nWhich also have paragraphs\n\n");

    assert_eq!(paragraphs[2], "\nWith one\n\nor more paragraphs.");

https://poiscript.github.io/orgize/ has a tool to visualize the internal syntax tree.