bobbylight / RSyntaxTextArea

A syntax highlighting, code folding text editor for Java Swing applications.
BSD 3-Clause "New" or "Revised" License
1.12k stars 259 forks source link

Adding line break in TokenMaker #509

Open maa-x opened 1 year ago

maa-x commented 1 year ago

Describe the solution you'd like I am trying to implement a "pretty printer" for HTML, whereby a line break is inserted after a tag is closed, as shown in this screenshot: image

This is a screenshot taken from Burp Suite, which I believe uses RSyntaxTextArea (Portswigger, the developper, hasd contributed to this repository and notably submitted #314).

Are there any workarounds? N/A

Additional context I am unsure if this should be done within the TokenMaker or the SyntaxView, or perhaps even using Folds?

bobbylight commented 1 year ago

Hi @MaxIsMyName ! I'm trying to get back to RSTA after a multi-month hiatus. Can you clarify when you'd want to insert the newline? Is it after they type the closing > for the closing tag, or do you want to insert the closing tag with the newline when they type </? I had trouble with the latter since HTML5 has many self-closing tags, but what you're specifically trying to do might dictate what's a good way to go about it.

maa-x commented 1 year ago

Good morning @bobbylight, thank you for RSTA, it's been a fun library to play with.

I'm not actually intending on using it to enable writing in the text area, but rather display HTTP requests and responses and some editing, but only to snip lines and/or censor specific items in the text area.

One thing I'd like is to be able to pretty print specific content types. For HTML for example, I would like to be able to display this:

<HTML><body><h1>Hi Bobby</h1></body></html>

As this:

<html>
    <body>
        <h1>Hi Bobby</h1>
    </body>
</html>

From what I have been able to figure out myself - I'm a slow learner - is that returning the firstToken seems to create a newline in the text area (which is correctly not counted in the line count. This makes sense to me, as getTokensForLine is saying that the line has ended. However, I must be doing something wrong.

For instance, if my rule captures both newlines and <<EOF>>, I get some odd behaviours. At times, it crashes, other times it loops infinitely. Other times, I end up duplicating newlines.

To me, this seems like I'm misunderstanding the difference between an endToken, a nullToken and a regular token.

I think I'm not respecting RSTA's design principles with newlines, but it's not clear to me exactly what I should be doing here.

As a side note, and let me know if you would prefer to answer this in a separate issue to help others find it, I am curious as to the best method to handle HTTP requests. Currently, in the getTokensForLine method of my lexer, if I detect that I'm in the body section for example, I will return the getTokensForLine of the appropriate TokenMaker for the content type. If my HTTP TokenMaker's getTokensForLine is called with an initialTokenType that isn't 0 or its own, it will also return another token maker's tokens.

Does this fit with RSTA's design principles or is there a better way to do this?

PS: I have created a skeleton file which creates RSTA's yyreset and zzRefill methods and does not initialise the zzBuffer. Creating the constructor without args is also impossible within the skeleton file, but could be done within the base token maker class I suppose. Please let me know if you would like me to submit a PR with the skeleton file (and potentially the empty constructor within the base token maker classes).

bobbylight commented 1 year ago

@MaxIsMyName - is there a PR you can point me to? I'd be cool with adding an HTTP Request TokenMaker to this library if you wre OK with it as well.

And yeah, about the design - some parts of RSTA are very hacky, done in ways to e.g. avoid memory allocation, even though these dayas doing so in Java is considered bad design. The TokenMakers in paraticular are a little hacky, making it tough to be aware of context across the entire file (as you're well aware!).

Anyway, all that to say I'd hate to hazard a guess without looking at your code. It sounds like you're delegating to e.g. HtmlTokenMaker if you detect the body is text/html and you're in the proper part of the request? If so that's very cool. RSTA's built-in TokenMakers don't do a lot of delegating because of how complex it is to maintain state, so there is a lot of code duplication across the lexers, especially across the web languages.

One random thing that may not be an issue for you - be sure you're using JFlex 1.4.1 and not a later version. Last I tried it, there were changes to the generated lexer code in later versions that caused some unit tests to break that don't break with 1.4.1 - presumably because of the assumptions the library makes about the lexer's private fields. That could be a cause of odd instability in your own TokenMaker.

As for a skeleton - yes that would be a fine addition - I'd prefer it self-contaained and separate from other changes just for cleanliness. I've considered doing that for a long time - say pulling JFlex 1.4.1 from a repo and pointing it to a custom skeleton as part of RSTA's build process, since it's not something easy for folks to set up and manual build steps are bad.