LuisMayo / objection_engine

Library that turns comment chains into ace attorney scenes, used in several bots
MIT License
105 stars 20 forks source link

Improved script parser #110

Closed Meorge closed 7 months ago

Meorge commented 1 year ago

This pull request adds a method for rendering text scripts with commands as videos. An example can be seen in example_parse_tags.py. The script format is rather verbose, but it gives the writer a lot of control over what happens in the video. If necessary, the writer could put together some macros to condense common patterns of commands.

Commands are wrapped in square brackets like so: [wait 0.5]. A command can be anything listed here, as well as [br] which inserts a line break. (Currently, line breaks are not added automatically; the user must explicitly insert them.)

~Because I didn't know how to have two Git branches active at the same time~ As a bonus, this pull request also makes the command handling more resilient against malformed commands. Instead of crashing the engine, it will print out that a command was malformed and then skip it.

Things to consider before merging:

Splitting pages across multiple lines of script text

Currently, each line is made into a single DialoguePage object, and with each DialoguePage object, the text box in the rendered video is cleared. This means that, if it has a lot of commands for it, a single line of script could get to be very long. For example, this has to be a single line in order to avoid text being cleared:

[startblip male]For one thing,[stopblip][sprite left assets/characters/phoenix/phoenix-document-idle.gif][wait 0.2][sprite left assets/characters/phoenix/phoenix-document-talk.gif][startblip male] there's no such[br]time as 9:68 AM.[stopblip][sprite left assets/characters/phoenix/phoenix-document-idle.gif][wait 0.4][sprite left assets/characters/phoenix/phoenix-document-talk.gif][startblip male] And September[br]43rd doesn't exist either![sprite left assets/characters/phoenix/phoenix-document-idle.gif][stopblip][showarrow][wait 2]

We could introduce some kind of "box terminator" indicator that would allow one DialogueBox's worth of content to be split up across multiple lines. So the above single line could be made more readable, like so:

[startblip male]For one thing,
[stopblip][sprite left assets/characters/phoenix/phoenix-document-idle.gif][wait 0.2][sprite left assets/characters/phoenix/phoenix-document-talk.gif]
[startblip male] there's no such[br]time as 9:68 AM.
[stopblip][sprite left assets/characters/phoenix/phoenix-document-idle.gif][wait 0.4][sprite left assets/characters/phoenix/phoenix-document-talk.gif]
[startblip male] And September[br]43rd doesn't exist either!
[sprite left assets/characters/phoenix/phoenix-document-idle.gif][stopblip][showarrow][wait 2]
[cutbox]

It's still a lot, but at least it's a bit easier on the eyes! (As for the verbose sprite paths, I thought it would make sense to keep them long like that, so that people could easily direct the engine to sprites at other, arbitrary locations if they so wanted.)

Line breaking

I like allowing for explicit line-breaking using the [br] command, but it would also be nice to allow automatic line breaking. As we've found, though, automatic line breaking rules are deceptively complex, and I wasn't able to find a Python library that would do this for us. For now, I thought maybe we could add a parameter to the script parser that lets the user choose from the following line break options:

Maybe in the future, we could work on a library for line breaking that would work with more languages? But that is probably a project for the farther future...

LuisMayo commented 1 year ago

This seems really good not gonna lie

The script format seems consistent, remember to allow escaped brackets in case the text has brackets.

This seems good. Thanks a lot

Meorge commented 1 year ago

Thanks! I'll have to test the escaping functionality - I'd hope that would be implemented by Python's regex parser, but I'm not certain. As for separating the boxes, I think it might make sense to use an XML/HTML sort of format, where the contents of each individual box are within like <box></box> tags. I'm working on a different project at the moment (as well as IRL/school things) but when I return to work on this, that'll probably be the other big thing I investigate.

Meorge commented 1 year ago

Okay, so a bit of an update to this! I looked more into Python's XML-parsing capabilities and found that it does appear to be possible to parse HTML-style documents with it. So, I wrote an XML version of the example script for this feature. A piece of it looks like this:

    <box>
        <blip action="start" gender="male"/>
        For one thing,
        <blip action="stop"/>
        <sprite loc="left" src="assets/characters/phoenix/phoenix-document-idle.gif"/>
        <wait duration="0.2"/>
        <sprite loc="left" src="assets/characters/phoenix/phoenix-document-talk.gif"/>
        <blip action="start" gender="male"/>
        there's no such<br/>
        time as 9:68 AM.
        <blip action="stop"/>
        <sprite loc="left" src="assets/characters/phoenix/phoenix-document-idle.gif"/>
        <wait duration="0.4"/>
        <sprite loc="left" src="assets/characters/phoenix/phoenix-document-talk.gif"/>
        <blip action="start" gender="male"/>
        And September<br/>
        43rd doesn't exist either!
        <sprite loc="left" src="assets/characters/phoenix/phoenix-document-idle.gif"/>
        <blip action="stop"/>
        <arrow action="show"/>
        <wait duration="2"/>
    </box>

In my opinion, this is nice because it makes each box much easier to read (no super-long lines), and it uses an already-recognized format instead of something we're putting together with regex expressions. However, it does make scripts a lot longer - the original script was 111 lines, and this XML version is 618 lines. (Again, though, every line is easy to read on the computer screen.)

One thing I'll really want to figure out for this is how to add color tags - perhaps instead of a DOM parser, I'll use a SAX parser. But if this XML format looks good to you, I'll continue refining it and hopefully have a pull request ready to merge before too much longer!

Meorge commented 1 year ago

The XML parser is working pretty well now! There's one issue I've run into: it currently strips whitespace from the left and right of text strings to avoid a lot of unwanted whitespace. To allow users to include whitespace between tags, I wanted to use HTML's &nbsp; entity. However, this isn't in regular XML, so I have to make a DTD for it and/or specify what character &nbsp; should be replaced with. The issue here is that I don't know what character to replace it with, that wouldn't be removed by the parser anyways. I don't think it'd be very good to just choose some random Unicode character and replace it with that...

LuisMayo commented 8 months ago

The XML parser is working pretty well now! There's one issue I've run into: it currently strips whitespace from the left and right of text strings to avoid a lot of unwanted whitespace. To allow users to include whitespace between tags, I wanted to use HTML's &nbsp; entity. However, this isn't in regular XML, so I have to make a DTD for it and/or specify what character &nbsp; should be replaced with. The issue here is that I don't know what character to replace it with, that wouldn't be removed by the parser anyways. I don't think it'd be very good to just choose some random Unicode character and replace it with that...

Couldn't we use an special tag? Like for instance <br> in html reperesents line breaks, so <crlf> could represent a trailing/starting whitespace

Meorge commented 8 months ago

I've added support for the <sp/> tag to manually add whitespace after the start and end trimming has been done. The XML file is quite big, but just as an example:

    <page>
        <startblip gender="male"/>
        Court is now in session for the
        <br/>
        trial of<sp/><font color="red">Larry Butz</font>.
        <sprite position="judge" src="assets/characters/judge/judge-normal-idle.gif"/>
        <stopblip/>
        <showarrow/>
        <wait duration="2"/>
    </page>

The <sp/> allows for there to be a space between "trial of" and "Larry Butz" when the latter chunk of text is colored. Without it, it would read "trial ofLarry Butz".

This is what a video rendered with the whole XML file looks like:

https://github.com/LuisMayo/objection_engine/assets/9957987/f2d1a72c-9351-45f2-b0cd-8ae67b01d1cd

Since the gavel animation support has been merged, I should probably add that, as well as the testimony indicator... It'd be really nice if we had a systematic way to define commands and then have them take effect across the whole engine, but that's probably a task for a future PR.

Meorge commented 8 months ago

Testimony and gavel slam commands are now usable from the XML interface!

https://github.com/LuisMayo/objection_engine/assets/9957987/1d5172ae-6285-4bba-af56-ac46dd72ce49

LuisMayo commented 8 months ago

Nice to know!

I'll review when I get the time!

LuisMayo commented 8 months ago

Hi!

A couple of doubts.

Is the first proposed scripting method (the square brackets one) still supported? Is there any planned documentation? This MR is huge both in content and usefulness so some documentation may be warranted.

Thanks a lot, this is a lot of work

Meorge commented 8 months ago

Is the first proposed scripting method (the square brackets one) still supported?

I was unsure, but it looks like it is. The file example_parse_tags.py contains an example that uses it and still works. The benefit I can see of the square bracket approach is that leading and trailing whitespace is preserved; the downside is that each text box has to be on a single line, whereas in the XML format you can space things out across multiple lines to preserve readability. With the XML format now existing, it might make sense to remove this one so that there aren't similar-but-subtly-different options.

Is there any planned documentation? This MR is huge both in content and usefulness so some documentation may be warranted.

Documentation is a great idea, and I'm sorry it didn't occur to me before. I guess the break I took from working on the project must've caused it to slip my mind. My school semester is just about to start, so I don't know if I'll have time immediately to work on it, but I'll definitely add that to my list of personal project things to work on when I find the time for it.

Meorge commented 8 months ago

I tried writing some documentation for it. Will see if I can work more on it tomorrow, but overall it's quite similar to the other format stuff that already exists, so beyond the example script I'm not sure how much documentation there is to write for it. 😅 The most important things to note are probably the <br/> and <sp/> tags.

LuisMayo commented 7 months ago

Ok this looks good to merge no afaik