durableOne / orgmunge

MIT License
75 stars 4 forks source link

Optionally specify todos.json contents via parameters #11

Closed crdoconnor closed 1 year ago

crdoconnor commented 1 year ago

Hi,

Would it be possible to allow something like:

from orgmunge import Org

parsed = Org(
    string_contents,
    from_string=False,
    todos={
        "todo_states": {
            "my_todo": "TODO",
            "my_next": "NEXT",
            "my_wait": "WAIT"
        },
        "done_states": {
            "my_cncl": "CNCL",
            "my_done": "DONE"
        }
    }
)

Rather than from a file?

One of the reasons being that I'd prefer a different default without involving a separate JSON file needing to be deployed on the user's file system. Another being that I might want to set the states from a separate org file. Either way being able to specify these from the constructor gives a lot more flexibility.

Let me know if you'd be open to this and if a PR would be welcome.

durableOne commented 1 year ago

@Nalisarc raised the same issue in https://github.com/durableOne/orgmunge/issues/3 I closed it back then because I could see no way to make lexer tokens dynamic. But now it hits me that there's no need to: any "TODO" keywords can probably be lexed as just any old text and the class methods that operate after the parse can handle the classification of the todo status (and even raise exceptions if the file has a todo keyword that's not among the known ones). It should be much easier to pass the todo states as part of the Org class constructor as you have shown or have the Org class look for them in an external file (including the one being parsed) otherwise.

Since this requires a non-trivial rewrite of some parts of the parser and the underlying classes, I want to create some regressions first to make sure we won't break anything. I've been toying with the idea of fudging my personal agenda file beyond recognition and using that as the main test case but feel free to contribute other test cases for regression testing.

And yes, of course a PR would be welcome if you want to work on this feature as well.

Nalisarc commented 1 year ago

Yeah, I spent a few weeks trying to get that to work but it was a tricky problem. Part of the problem was that PLY needs to know what the TODOs are to generate out one of the files that gets imported. (parsertab.py) Theoretically we might be able to generate it on the fly but then you run into trouble with the import system.

I haven't cracked it, but if you have an idea I'd be happy to write you some tests

durableOne commented 1 year ago

I think the approach I outlined above should work: define a todo keyword, not as its own token but as just "TEXT" and then do categorization of the todo keywords in the post-processing done by the "Headline" class. I'll start working on it but like I said, I think we need some good regressions first: full-fledged Org files that cover most productions in the parse tree.

durableOne commented 1 year ago

Added a regression test in https://github.com/durableOne/orgmunge/commit/c55ae5e322a7c1900fea773910d886f8c6b07085. Will start working on this feature now. Note that you need pytest-regressions to run the test.

The regression file (./tests/files/regr.org) is based on my personal agenda. All of my information is replaced with random text using... orgmunge. That's our first use case :)

I made a mini repo with the code I used to redact my agenda file (redactOrg) and added a "Use Cases" section to the README so we can collect different things people build on top of orgmunge.

crdoconnor commented 1 year ago

Hi @durableOne how is this feature going? I'm keen to try it out :)

durableOne commented 1 year ago

Sorry @crdoconnor this is on my list. I just had life things happening over the past couple of weeks. I hope to get some work done on it this week.

durableOne commented 1 year ago

Working on this in the dev branch now. I have started work trying to get the parser to treat todo keywords as regular text. Currently failing on single-word headlines because this word is interpreted as a "todo keyword" and the parser finds no title. Hoping to put in a fix tomorrow.

durableOne commented 1 year ago

@crdoconnor , @Nalisarc Please take a look at https://github.com/durableOne/orgmunge/commit/5d31b6f20008b336a6a8b1f605208da0eeb96553 (dev branch)

It turns out I didn't know PLY well enough: you can build a lexer on the fly by embedding it in a class, and define the tokens based on constructor arguments. So I've had to rewrite the lexer and parser and refactor some of the other code to follow this alternate model. Now it's much more flexible. When using Org(), you can now pass in a todos keyword argument. Example:

parsed_file = Org('/path/to/file', todos={'todo_states': {'todo': 'TODO'}, 'done_states': {'done': 'DONE'}})

If this keyword argument is left out, it falls back to the previous behavior of looking for the 'todos.json' file.

This is currently passing all tests (including an additional one I created). I just need to add the ability for it to parse the TODO keywords from the file/string passed in if they're defined there. @crdoconnor I hope this is functional enough for you to start playing with it.

crdoconnor commented 1 year ago

Amazing. I'll check it out. Thanks!

Nalisarc commented 1 year ago

Awesome work!

durableOne commented 1 year ago

@crdoconnor, @Nalisarc

I believe this issue can now be closed. The latest commit adds the ability to parse out todo keywords from the input file/string.

I'm not merging this into main until I get some feedback from you.

crdoconnor commented 1 year ago

Sorry I havent looked yet Ive had some life emergencies recently. I will try it out this weekend for sure though. If it works I will be swapping my package over from orgparse to this.

Thanks once again.

Nalisarc commented 1 year ago

I'm having a bit of trouble getting your new test to work, pytest is telling me: "UnicodeEncodeError: 'charmap' codec can't encode character '\uf0ea' in position 438: character maps to "

Probably user error on my part. Otherwise everything looks good to merge.

crdoconnor commented 1 year ago

@durableOne it's working well for me, thanks!

crdoconnor commented 1 year ago

I've switched my app over from orgparse to orgmunge. My tests are all passing now while using the orgmunge dev branch. I didn't spot any bugs while doing this, just some minor differences in behavior. I reckon it's good to go.