godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.15k stars 97 forks source link

Unify BBCode tokenizing/parsing within Godot #10339

Open RedMser opened 2 months ago

RedMser commented 2 months ago

Describe the project you are working on

Godot Editor

Describe the problem or limitation you are having in your project

While thinking about solutions for #10318, I've noticed that Godot has many ways a BBCode string is parsed or tokenized:

Each implementation has its own flavor of BBCode and supported tags and syntaxes. It makes changing the BBCode syntax or parsing it for a new purpose (e.g. conversion to different formats like markdown or HTML) very difficult.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Create a configurable BBCodeParser class which will be used to replace the individual parser implementations. Try to unify the different BBCode syntaxes as much as possible, still allowing configuration while being type-safe. From what I can tell, the intended syntax flavors are:

  1. RichTextLabel
  2. XML documentation (special tags like [Node2D] or [method blah])

I am not sure if XML documentation is supposed to be a superset of RichTextLabel. As it stands, many tags are not supported, and the new documentation syntax is only very loosely interpretable as BBCode (it should be [method=blah] or [method name=blah] to be correct - but we could generally allow things like [color green] in the new parser too).

I would expose the BBCodeParser to scripting as well, unless there is opposition due to the number of classes it would introduce (see code snippets below).

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

Class definitions:

## Base class for BBCode parsing, this implements the basic syntax rules but knows no BBCode tags by default.
class BaseBBCodeParser extends Resource:
    var resource_local_to_scene: bool = true

    ## Tags that are known to the parser. Starts empty, can be manually extended.
    ## If a tag with an empty name is registered, its definition will be used as a fallback for any unknown tags.
    ## This can be used to implement documentation tags like [String] which interprets all unknown tags as self-closing.
    var registered_tags: Array[BBCodeTagDefinition]

    ## Can be called multiple times, parser keeps tag stack between calls.
    func push_bbcode(bbcode: String) -> BBCodeParserResult
    ## Automatically escapes BBCode tags, used for parsing plaintext.
    func push_text(text: String) -> BBCodeParserResult

## Registers BBCode tags known by the RichTextLabel.
class RichTextLabelBBCodeParser extends BaseBBCodeParser

## Registers cross-reference tags like [String] or [member abc], as well as any supported [b] or whatever tags.
class GodotDocBBCodeParser extends BaseBBCodeParser

class BBCodeTagDefinition extends Resource:
    var name: String
    ## Dictionary of dictionaries, structured like so:
    ## {
    ##    "parameter_name": {
    ##        "type": VariantType, # defaults to Variant
    ##        "required": bool, # defaults to false
    ##    }
    ## }
    ## Parameter name can be an empty string to allow an anonymous parameter like [color red] or [color=red]
    var parameters: Dictionary
    ## Enable this to allow tags like `[code][b][/code]` without escaping.
    var escape_contents: bool
    ## For tags like `[br]`.
    var is_self_closing: bool

class BBCodeTag extends RefCounted:
    var name: String
    ## Dictionary structured like so:
    ## {
    ##    "parameter_name": value,
    ## }
    ## Parameter name can be an empty string in the case of an anonymous parameter like [color red] or [color=red]
    var parameters: Dictionary

class BBCodeParserResult extends RefCounted:
    func get_error() -> BBCodeParserError

    ## Flat list of parsed items, each Dictionary is formatted like so:
    ## {
    ##    "text": String, # can be empty for self-closing tags
    ##    "styles": Array[BBCodeTag], # empty array in case of unstyled plain-text
    ## }
    func get_items() -> Array[Dictionary]

    ## Tree structure of the parsed BBCode. Returns all root-level elements.
    ## Each tree item is either a String or a BBCodeTreeItem.
    func get_tree() -> Array[Variant]

class BBCodeTreeItem extends BBCodeTag:
    ## Returns all items within the BBCode tag.
    ## Each tree item is either a String or a BBCodeTreeItem.
    func get_contents() -> Array[Variant]

enum BBCodeParserError {
    ## No errors while parsing.
    OK,

    ## General parse errors like missing tag name, missing closing bracket, etc.
    SYNTAX,

    TAG_UNKNOWN,
    TAG_UNCLOSED,

    PARAMETER_UNKNOWN,
    PARAMETER_TYPE_MISMATCH,
    REQUIRED_PARAMETER_MISSING,
}

Example usage:

# Initialization.
var parser = RichTextLabelBBCodeParser.new()
var result = parser.push_bbcode("hello [color=red]world[/color]")
assert(result.get_error() == BBCodeParserError.OK)

# 1. Flat list API:
for item in result.get_items():
    prints(item.text, len(item.styles))
    for style in item.styles:
        prints("TAG", style.name, style.parameters)

# This would print:

# hello 0
# world 1
# TAG color { "": "red" }

# 2. Tree API:
var root = result.get_tree()
prints(root[0]) # hello
prints(root[1].name, root[1].parameters) # color { "": "red" }
prints(root[1].get_contents()) # ["world"]

RichTextLabel would for example instantiate a RichTextLabelBBCodeParser in its constructor and add any RichTextEffects to the tag definition list.

Stringifying the parser results back to BBCode could be part of the base parser API as well. But other conversion targets like Markdown or RST should be implemented by the caller of the parser.

Open discussion points:

If this enhancement will not be used often, can it be worked around with a few lines of script?

Yes but writing a new feature-complete BBCode parser is a lot of effort to maintain and prone to diverge from other implementations.

Is there a reason why this should be core and not an add-on in the asset library?

BBCode parsers are already in core. This is a refactor combined with exposing the new class to scripting for user's convenience.

dalexeev commented 2 months ago

I had a similar idea. We could add an abstract SyntaxParser and default implementations for BBCode and Godot Doc BBCode. Users could then implement their own parsers for RichTextLabel, for example to add Markdown support.

Notes ``` SyntaxParser (core) extends Resource, resource_local_to_scene = true BaseBBCodeParser (core) RichTextLabelBBCodeParser (scene) GodotDocBBCodeParser (editor) EditorHelpBBCodeParser (editor) and other parsers (C#, GDExtension bindings) SyntaxParser ============ Abstract syntax parser. virtual void _set_source_code(const String &p_source_code); virtual void _parse(); void set_source_code(const String &p_source_code); void parse(); BaseBBCodeParser ================ Abstract BBCode parser, like XMLParser. Doesn't know about specific RichTextLabel tags. virtual void _set_source_code(const String &p_source_code) override; virtual void _parse() override; virtual void _parse_tag(const String &p_name, bool p_is_open, const Dictionary &p_params); virtual void _parse_text(const String &p_text); RichTextLabelBBCodeParser ========================= Outputs BBCode to RichTextLabel with push_*(), pop(), and add_*() methods. virtual void _parse_tag(const String &p_name, bool p_is_open, const Dictionary &p_params) override; virtual void _parse_text(const String &p_text); void set_rich_text_label(RichTextLabel *p_rich_text_label); GodotDocBBCodeParser ==================== Abstract Godot Doc BBCode parser. Knows cross-reference tags, [param], etc. virtual void _parse_tag(const String &p_name, bool p_is_open, const Dictionary &p_params) override; virtual void _parse_text(const String &p_text); EditorHelpBBCodeParser ====================== Outputs Godot Doc BBCode to Editor Help with push_*(), pop(), and add_*() methods. virtual void _parse_tag(const String &p_name, bool p_is_open, const Dictionary &p_params) override; virtual void _parse_text(const String &p_text); void set_rich_text_label(RichTextLabel *p_rich_text_label); ``` If the `RichTextLabel.syntax_parser` property is not set, `RichTextLabel` uses the default `RichTextLabelBBCodeParser`.
Calinou commented 2 months ago

This might be useful for proper BBCode-to-ANSI conversion as well (which is needed for print_rich() to fully support terminal output correctly).

RedMser commented 2 months ago

@dalexeev I like your proposal to make this more focused on inheritence, although the API would have to be well thought out to make it worth expanding to other domains on in the future. I'd personally remove the generic SyntaxParser base class. Too many domain specific demands, like parsing PackedByteArray or streaming data. Something like XML, JSON or GDScript is unlikely to be able to switch to a new base class like this one.

Also Resource makes sense, for re-using parsers and storing their configurations in tres files.

Not sure why you need to set the RTL instance, it should probably only be a one-directional relationship, and RTL updates properties on the parser instance to add custom richtexteffects.

I'll update my proposal soon accordingly. Then I'll ask for feedback from RTL maintainers, also since I saw a new PR for BBCode stringify appear.

RedMser commented 2 months ago

@bruvzg Hi, I'd like your feedback on this proposal if you have the time, since you're very deep in the code of RichTextLabel and BBCode.

I'm mainly looking for answers of the open questions near the end of the issue. But a look over the code API might make sense as well, since it would be great to expose to scripting / GDExtensions.