PoiScript / orgize

A Rust library for parsing org-mode files.
https://poiscript.github.io/orgize/
MIT License
289 stars 35 forks source link

Idea for collaboration #51

Open gitonthescene opened 3 years ago

gitonthescene commented 3 years ago

Hello, I found your project from the worg tools list. First, sorry for the semi-spam nature of this issue. I had a notion for a project that the org community might find useful and I'm looking for feedback. Feel free to close this issue if it doesn't sound useful to you.

My idea is to start a list of org-mode snippets which can serve as a test bed for people developing tools. The idea is that having a separate collection of examples makes it easier for others in the community to benefit from the examples developed through communication with users.

Users could use these samples to try to construct minimal examples of issues they're having and/or contribute examples there which others could benefit from. Exactly how it will take shape is still up in the air.

These samples could also serve as a place to discuss ideas about how to develop the grammar itself. According to worg, the spec is still in draft state.

There's not much there at the moment. Mostly because I don't want to commit too early to what seems like it might be useful. I'll add more examples as I go.

If you like the concept and/or want to contribute and/or just want to offer feedback, I'd very much appreciate it.

Again, sorry for the spam.

calmofthestorm commented 3 years ago

IMO not spam at all, thank you for sharing and interesting and completely topical project. The issues on this repo (both open and closed) have a number of edge cases @PoiScript and I have discussed.

I like the idea of a corpus of ambiguous/tricky/edge case/etc that can be used for testing, analogous to the big list of naughty strings. It would be a helpful artifact.

Ideally there would be one totally unambiguous Org grammar that all parsers would respect. Unfortunately at this point, I think that ship has sailed. That does not mean it cannot be done, but that it would be very hard. I wish you well but am not optimistic:-)

The next best thing is clear and precise documentation of edge case behavior. @PoiScript and I have disagreed on how a few edge cases should be interpreted, and that's fine. You could imagine that a parser could come with a test suite which clearly specifies its expected behavior for each of the edge cases in the corpus, thus preventing regressions, unexpected changes, etc, even if different parsers disagree. You could also imagine a linter tool that would warn about ambiguous cases in org files.

I think that it's frustrating to try to create a completely unambiguous parser for a language without an unambiguous formal specification. The approach I am currently taking is a bit different, and while it suffices for my needs, it is less satisfying. I would like to precisely specify a minimal grammar specifying a hierarchy of headlines and the "body" of each unambiguously (possibly a few other things like properties drawer, planning, etc). For the most part I am using "whatever Org mode does" as the source of truth there, but IIRC there are edge cases where different Org mode commands do different things.

This would then allow one to develop a parser which can unambiguously parse an Org file into a tree with the invariant that emit(parse(org)) == org for all strings org.

This parser would then be very safe, because you could be assured that if you parse a large file, only the headlines that you modify would be changed[1], limiting the scope of damage, unintended side effects, etc, and preventing mass changes that propagate far from the site. I understand Org files as a tree, and dislike anything which changes the tree structure unintentionally much more strongly than any unintended change limited to a single headline/body.

I lack time at the moment, but I would like to implement the unambiguous headline parser some day because I'm not aware of anything like it. I think that there are many parsers (including orgize) that would be sufficient for my needs for actually parsing the content of a headline, since I'm much more flexible on what behavior I would accept there.

The main thing I would suggest is going over the tests in this repository/other parsers and collect their unit test strings. I also suggest reading the issues (open and closed) on this repository, as I have raised a number of edge cases in them.

[1] I believe there are two ambiguous cases: document terminal newline and initial newline, that can't be avoided in all cases.

gitonthescene commented 3 years ago

What if we had an lsp server that runs alongside of org to “lint” the file to make sure it conforms to a stricter grammar? This wouldn’t constrain org-mode itself but would offer a target for programmers to other platforms to shoot for.

I’ve tried to start a discussion on the mailing list.

appetrosyan commented 5 months ago

Hi, I'm the current maintainer of org-rs. The projects are a different type, but I wonder if we could collaborate on creating a reference set of test cases, that could be used for comparing compliant implementations.

gitonthescene commented 5 months ago

@appetrosyan - I'm not sure who that was directed to, but I got a start on something but haven't looked at it for quite a while.

PoiScript commented 5 months ago

@gitonthescene I apologize that I missed this issue earlier.

Maintaining a test suite is absolutely a good idea, considering org-mode doesn't have a strict syntax specification. we already have ton of test cases in orgize, I'm happy to adding more from https://github.com/gitonthescene/org-mode-samples

What if we had an lsp server that runs alongside of org to “lint” the file to make sure it conforms to a stricter grammar?

I've been recently working on a language server for org-mode: https://github.com/PoiScript/orgwise, and it currently supports some common features like formatting, outline, references, and snippet.

I haven't started on adding the diagnostic (linting) feature yet. If you got some ideas on how it can be implemented, feel free to leave a comment on there.

appetrosyan commented 4 months ago

I'm not sure who that was directed to, but I got a start on something but haven't looked at it for quite a while.

This sounds like exactly what we need.