DRincs-Productions / pixi-vn

Create visual novels with a modern 2D rendering engine and your favorite JavaScript framework.
https://pixi-vn.web.app/
GNU General Public License v3.0
18 stars 1 forks source link

Renpy Parser #167

Open BlackRam-oss opened 3 months ago

BlackRam-oss commented 3 months ago

On the internet I found this library that allows you to run a ren'py file to get all the information of that file. I think it can be used to give the possibility of making renpy files compatible with pixi vn.

https://github.com/damyo-scientists/renpy-js developed by @horace-velmont

I noticed that the repo is based on the official renpy parser in python:

Renpy: parser.py https://www.renpy.org/static/wiki/parser.py

BlackRam-oss commented 3 months ago

Hi @blurymind I'm not very experienced with parsers, do you think it's possible to give the possibility to bind renpy files with pixi-vn? as we are doing with ink

blurymind commented 3 months ago

That would be a killer feature for sure... but I dont have alot of experience with parsers. I did some bug fixing on bondagejs (yarn file parser) in the past. That uses jison, not sure if its the best choice. Inkjs doesnt seem to have a dependency, but their mechanism is to first compile the ink file into a json file and then parse that - its a different strategy than directly parsing it.

Its a cool idea because in theory it might help with directly converting renpy games to pixi-vn games!

BlackRam-oss commented 3 months ago

Update:

I tested https://github.com/damyo-scientists/renpy-js in the basic cases it manages to extrapolate something, but I'm afraid it doesn't have such an effective control.

But I thought of another way that could be effective even if a Parser is not used: you can use https://github.com/renpy/vscode-language-renpy developed by @duckdoom4. the extension can extrapolate information from the current file to color and report errors. you could use the same mechanism to extrapolate a model with the information from the file.

It doesn't seem like a simple process and it would need the help of one of the developers. but it would have the advantage that it would be continuously updated by renpy.

duckdoom4 commented 3 months ago

Hi, I've started working on a full parser behind the scenes on different branches on that repo. One is on the 'parser' branch, and the other on the 'lsp-server' branch.

(With the lsp-server setup, it should be possible to interact with that from any other application as well.)

I'd love to help setup what you guys need, and get some help with that parser from you guys as well.

To give a quick overview of what is currently implemented: The tokenizer (lexer) is already complete and fully functional, so it's already possible to extract a whole lot of information from that.

The parser is in it's early stages though, the parser is needed to get an AST (abstract syntax tree). The AST allows us to verify if a token sequence is valid renpy code grammar.

BlackRam-oss commented 3 months ago

Hi, I've started working on a full parser behind the scenes on different branches on that repo. One is on the 'parser' branch, and the other on the 'lsp-server' branch.

(With the lsp-server setup, it should be possible to interact with that from any other application as well.)

I'd love to help setup what you guys need, and get some help with that parser from you guys as well.

To give a quick overview of what is currently implemented: The tokenizer (lexer) is already complete and fully functional, so it's already possible to extract a whole lot of information from that.

The parser is in it's early stages though, the parser is needed to get an AST (abstract syntax tree). The AST allows us to verify if a token sequence is valid renpy code grammar.

yes i noticed that branch. in fact i planned to fork it and then create a library.

i can help, but it would be better to create an npm parser library external to the vscode extension project, in this way i can install it in my project and in this way it is easier to understand how it works.

duckdoom4 commented 3 months ago

Hi, I've started working on a full parser behind the scenes on different branches on that repo. One is on the 'parser' branch, and the other on the 'lsp-server' branch. (With the lsp-server setup, it should be possible to interact with that from any other application as well.) I'd love to help setup what you guys need, and get some help with that parser from you guys as well. To give a quick overview of what is currently implemented: The tokenizer (lexer) is already complete and fully functional, so it's already possible to extract a whole lot of information from that. The parser is in it's early stages though, the parser is needed to get an AST (abstract syntax tree). The AST allows us to verify if a token sequence is valid renpy code grammar.

yes i noticed that branch. in fact i planned to fork it and then create a library.

i can help, but it would be better to create an npm parser library external to the vscode extension project, in this way i can install it in my project and in this way it is easier to understand how it works.

Yeah, so that's the idea behind the lsp-server. You basically build a completely separate application (I chose to build it in typescript as an npm package). That lsp-server can then be used outside of vscode as well. The 'client' is the vscode extension. (Read this if you want to learn more).

In your case you'd implement the client side in your application using the Language Server Protocol setup by Microsoft. The server will then do all the parsing and you'd only have to request the data you want.

BlackRam-oss commented 3 months ago

Hi, I've started working on a full parser behind the scenes on different branches on that repo. One is on the 'parser' branch, and the other on the 'lsp-server' branch. (With the lsp-server setup, it should be possible to interact with that from any other application as well.) I'd love to help setup what you guys need, and get some help with that parser from you guys as well. To give a quick overview of what is currently implemented: The tokenizer (lexer) is already complete and fully functional, so it's already possible to extract a whole lot of information from that. The parser is in it's early stages though, the parser is needed to get an AST (abstract syntax tree). The AST allows us to verify if a token sequence is valid renpy code grammar.

yes i noticed that branch. in fact i planned to fork it and then create a library. i can help, but it would be better to create an npm parser library external to the vscode extension project, in this way i can install it in my project and in this way it is easier to understand how it works.

Yeah, so that's the idea behind the lsp-server. You basically build a completely separate application (I chose to build it in typescript as an npm package). That lsp-server can then be used outside of vscode as well. The 'client' is the vscode extension. (Read this if you want to learn more).

In your case you'd implement the client side in your application using the Language Server Protocol setup by Microsoft. The server will then do all the parsing and you'd only have to request the data you want.

LSP is the best goal to achieve for both me and you. I would also be interested in implementing a function that extracts a json with the information of a renpy file (with LSP it is certainly feasible). I have never created one but I can certainly help.

However, I insist on the fact that to create it it would be better to start from an empty project with a Submodules (temporary) that points to https://github.com/renpy/vscode-language-renpy/tree/parser (to obtain the elements that are currently in that repo) in this way you can create a library to publish on npm, which in the future will be installed in the necessary projects. also doing so would make it much easier for me to understand because in the project there would only be the files necessary for the parser.

I or you can create a new repo to start working on it (giving access to others), then eventually move it to renpy.

duckdoom4 commented 3 months ago

I understand you completely, and I also avoided implementing the lsp elements for that reason. It adds a lot of 'unnecessary' overhead. (In my case it has become a necessity, because that's the only way to do proper multi-threaded processing. Allowing the server to be processing all files in the project without the client freezing).

I will say that it's just me doing this for fun in my free time (which is already limited), and a completely separate library doesn't align with my (personal) current goals for the project.

That said, I'm happy to tell you what files to look at and how to use them. I already coded it in a modular way, so the tokenizer and parser sources don't use many vscode specific types. Only thing I can think of right now is the text document API. I'm even willing to make a couple extra abstractions if it turns out it does have some dependencies that can't be transferred, so it can be used outside of vscode

BlackRam-oss commented 3 months ago

Ok I can help with vscode-language-renpy implementation but I will need more detailed explanation to understand how to test the LSP. anyway I will reuse the same code in my library where I need this implementation.

duckdoom4 commented 3 months ago

The LSP is something I just started a couple of days ago. I'll need some time to link and move the previous implementation correctly. As far as I can tell, it won't take too much effort. Should be just a matter of calling the right API from the right events.

FYI, I'm in the middle of moving to a new place, so I will have very limited time until the 9th of August.

I think you'll be more interested in the parser branch rn. There is a single function called 'testParser()'. That will run the current implementation of the parser. What it does right now is run the tokenizer on the opened file, then run some simple parser rules and output some parser debug information to the console.

I was working on a formal grammar definition that you can find in the 'grammar' folder. That is very helpful for defining the grammar rules correctly in code. (The grammar file is about 60% complete, and it might be missing some of the new features.)

Quick overview of the functionality:

The tokenizer is fully mature and is used in production right now. It is able to output some 'unusual' tokens (it's sort of a parser itself). It is able to output meta tokens that basically give you a rudimentary AST. It's just not yet verified, which is what the parser is supposed to do. The tokens are listed in the order they appear in the file, and they form a tree like structure. The main tokens are the individual components (eg. Keywords, variables, single character). Those won't overlap. The meta tokens will overlap and is what makes up the tree (eg. Comment, DefineStatement, SayStatement, etc.)

The tokenizer works equivalent to the one build into vscode, which they use for syntax highlighting. (Which is using this https://github.com/kkos/oniguruma). (My version is a completely custom implementation in ts though).

Sorry if that was too much info dumping, but let me know if you need more info :)

BlackRam-oss commented 3 months ago

The LSP is something I just started a couple of days ago. I'll need some time to link and move the previous implementation correctly. As far as I can tell, it won't take too much effort. Should be just a matter of calling the right API from the right events.

FYI, I'm in the middle of moving to a new place, so I will have very limited time until the 9th of August.

I think you'll be more interested in the parser branch rn. There is a single function called 'testParser()'. That will run the current implementation of the parser. What it does right now is run the tokenizer on the opened file, then run some simple parser rules and output some parser debug information to the console.

I was working on a formal grammar definition that you can find in the 'grammar' folder. That is very helpful for defining the grammar rules correctly in code. (The grammar file is about 60% complete, and it might be missing some of the new features.)

Quick overview of the functionality:

The tokenizer is fully mature and is used in production right now. It is able to output some 'unusual' tokens (it's sort of a parser itself). It is able to output meta tokens that basically give you a rudimentary AST. It's just not yet verified, which is what the parser is supposed to do. The tokens are listed in the order they appear in the file, and they form a tree like structure. The main tokens are the individual components (eg. Keywords, variables, single character). Those won't overlap. The meta tokens will overlap and is what makes up the tree (eg. Comment, DefineStatement, SayStatement, etc.)

The tokenizer works equivalent to the one build into vscode, which they use for syntax highlighting. (Which is using this https://github.com/kkos/oniguruma). (My version is a completely custom implementation in ts though).

Sorry if that was too much info dumping, but let me know if you need more info :)

Ok should I fork this or can you add me to the projects?

duckdoom4 commented 3 months ago

Unfortunately I can't add you as a dev myself. (I'm not the owner of the repo with admin rights). So for now you'll need to fork and make a PR.

BlackRam-oss commented 3 months ago

ok I forked and invited you so we can collaborate. could you create a Document mock to replace activeEditor.document, which points to a renpy file inside the repo? in this way I could run the parser without starting the vscode-language-renpy extension (also because I don't know how to start it)

duckdoom4 commented 3 months ago

Doing this right now is a bit difficult due to my limited time until the 9th. I can try to set that up ASAP, but can't promise it can be done before that.

To launch the extension is relatively simple though. (There should be instructions on the repo).

You just open the repo using vscode, then you run npm install, and finally you just run the extension (through vscode's launch mechanism)

BlackRam-oss commented 3 months ago

ok, yes i understand how to execute it. now i will spend some time to understand how the parser actually works.

duckdoom4 commented 3 months ago

Awesome, I've added a new section on the main page with the build instructions in a PR. (Also includes tips for usage btw).

I realized that it can be hard to find in the contributing link.

Let me know if you need any more help or if you'd like me to explain something. Chatting using my phone I can do while I'm not at my pc

BlackRam-oss commented 3 months ago

Hi, I took a look at the parser. From what I understand the relevant part is in the parser-test file from line 32 to 43. What I don't quite understand is what it does inside statementParser.test.

Anyway, what interests me most is when the parser "extracts information" in the rows that make up a label, has this step already been implemented? If it hasn't been implemented, could you tell me more or less where and how this step should be implemented and what functions can be useful?

duckdoom4 commented 3 months ago

Yes, you are correct. That's the core loop.

The statementParser.test() function is running the code here.

The way the parser is structured right now, is that I have defined a bunch of 'rules', which build 'nodes'.

So to give an example, the line linked above is a rule that loops through a list of rules. One of those rules is the DefineStatementRule. This rule has a test and a parse function. If test returns true, this means the rule should be parsed.

The loop (a bit back up the callstack) then triggers the parse method. This parse method will parse tokens.

Again to give an example, the DefineStatementRule checks the current token. In this case it requires that the first token is the keyword 'define'. The renpy syntax allows an optional integer, then an assignment operation. And lastly it expects an end of line.

In the comments above the class you can see a bit of EBNF grammar definition. Please read the comment in the grammars/renpy.grammar.ebnf file to understand the syntax.

duckdoom4 commented 3 months ago

It appears I have not yet made the rule for labels, however I did already define the ebnf definition for labels, so it should be possible to implement the label rule with that definition.

I should note though, that I know some edge cases have not been handled by the parser yet. For example, I don't think I handle comments yet. And the error handling needs to be improved, but if we start with some valid syntax while building the parser we can deal with that later.

Also check out this file (and the others in that folder) for a complete list of all the tokens that the tokenizer can output. There is a command to display the tokens in a debug view, which allows you to hover over them. I have it bounds to ctrl+alt+shift+T, but you might need to bind it manually image

image

duckdoom4 commented 3 months ago

Also, I hope to get some time before the 9th so we can work on it together, but I doubt I will have time before that. On the other hand, this gives you a chance to get familiar with the codebase and stuff so I guess it's not that big of a deal for those couple of days right?

BlackRam-oss commented 3 months ago

yes, don't worry. It will take a while before I understand how the parser works to be able to implement something. Every now and then I will ask you some questions

duckdoom4 commented 3 months ago

I had a couple minutes when I took a break and I did a quick test run. Noticed a couple of bugs, so fixed them real quick. I pushed them to the origin branch.

I also added a simple parser_test.rpy file, which will run the define parse and then process the AST to get the defined 'e' symbol. (You can put some breakpoints to see what it does). I did notice I didn't hook it up to the document changed event, so currently it will only run on start

duckdoom4 commented 3 months ago

Made a bunch of improvements again. It is now able to parse labels and has improved error handling

BlackRam-oss commented 3 months ago

@duckdoom4 Today I took a look at the changes you made. But I'm not clear: after the parser reads a file, does it generate an element from which I can extract information?

I'm not sure what else needs to be implemented, however if you write me what needs to be implemented and more or less how you thought of doing it, I can help you

duckdoom4 commented 3 months ago

Yeah, so I made some more changes on the main branch. I've merged them into your branch now.

How it works internally

So if we look at the file parser-test.ts, the idea is that the tokenizer runs on the document. This converts individual characters of a file into a sequence of tokens. (See the DocumentParser constructor)

Then the parser uses this sequence to generate an AST (abstract syntax tree). ast.process(program); This is a simplified form of logic that makes up the entire application. (See this link for an example of typescript AST. Just paste in some ts code).

We then process the AST (iterate the nodes) to create the 'RpyProgram' object using ast.process(program);, which you use to extract information.

As an example the following line searches for a symbol named 'e' in the global program scope (note that python doesn't really have local scopes, so might remove this later): const sym = program.globalScope.resolve("e"); This returns a RpySymbol object which holds a definitionLocation and a list of references. The definitionLocation is the source (file) location where the symbol is defined. And references are the source locations where the symbol is used. (Though I'm pretty sure this list isn't filled yet in the current implementation).

--

What we can do next

So I hope that makes it a bit more clear how it works. Let's talk about what to do next.

My latest addition is label definition processing. I'd recommend looking at the changes I introduced to renpy-grammar-rules.ts since this commit. (I should edit this link to compare with current branch)

What I think could be a really nice introduction for you is attempting to implement the 'jump' statement from renpy. This one is very simple.

Define the grammar

First we need the grammar definition. (I already made it, you can find it here.) That definition describes how renpy parses the statement, so we can almost 1 to 1 copy the logic. (I also recommend looking at the official implementation to make sure we implement it correctly).

Implement the parser

With the parser implemented (let me know if you still want some more details about that), we can now integrate it. That's also pretty straight forward. If you search in the ebnf file, you will also find out where it's supposed to be used. (In this case only as part of the statement group). So adding it to this list will mean it's processed correctly.

Add the AST

Lastly, we will need to create and implement the CallStatementNodein ast-nodes.ts, which needs the required fields and a process method override to add it's LabelNameNode to the list of references of the label symbol. (And here we can also emit a compile error if the label wasn't defined here yet.)

Result

If we did everything correctly, we should then be able to go back to the parser-test.ts file and use the RpyProgram object to resolve a label name to a RpySymbol and see references added to the list. (Provided we added that rpy code to the file that was processed)

Let me know if you need more info or help with anything. Looking forward to your attempt! (And no worries if you don't quite get it yet, it took me weeks to wrap my head around compiler theory)

duckdoom4 commented 3 months ago

Just in case you tried it today or yesterday; I just pushed a fix to the parser branch (yours as well). Apparently I broke it, but it's fixed now xd

BlackRam-oss commented 3 months ago

hello sorry for not writing to you, I've been very busy developing the integration with ink (another narrative language), I didn't expect ink to be very large.

as soon as I'm done with ink I'll switch to renpy.