Shinmera / plump

Practically Lenient and Unimpressive Markup Parser for Common Lisp
https://shinmera.github.io/plump
zlib License
120 stars 21 forks source link

Using keyword instead of string for tags and attributes? #39

Closed kchanqvq closed 2 years ago

kchanqvq commented 2 years ago

It seems to me that uninterned strings are just slightly suboptimal and less Lispy than using symbols or keyword symbols. Is there a rationale for using strings?

Is it desirable to move from strings to symbols? If so I could work on a patch. There might also be complications about backward compatibility...

Shinmera commented 2 years ago

It is not. Symbols are problematic because they are not automatically garbage collected unless manually uninterned, which is an issue when parsing untrusted data.

kchanqvq commented 2 years ago

A mix of symbols and strings may work? (Clarification: we don't intern anything when reading untrusted data, but reuse existing interned symbol if there's one). It sounds like over-stretching for an HTML parsing package, but the plump DOM model (instead of the parser part) is also used for various other projects, which frequently wants to find element by tag, id, attribute, etc. Most of the relevant tokens occur as string literals, and the code could intern them by simply switching to 'token or :token. If the DOM model could then support both symbols for "known tokens" and strings for "unknown tokens", it could save lots of string operations for those projects.

Shinmera commented 2 years ago

Have you actually determined that string operations are a performance problem for you or is this just empty guessing

kchanqvq commented 2 years ago

Performance is one thing, it just doesn't feel as good to write "class" everywhere, comparing to 'class or :class, wink wink.

Shinmera commented 2 years ago

That is not a good reason to make any of those changes.

You'll get over that feeling before long.