Closed kchanqvq closed 2 years ago
It is not. Symbols are problematic because they are not automatically garbage collected unless manually uninterned, which is an issue when parsing untrusted data.
A mix of symbols and strings may work? (Clarification: we don't intern anything when reading untrusted data, but reuse existing interned symbol if there's one). It sounds like over-stretching for an HTML parsing package, but the plump
DOM model (instead of the parser part) is also used for various other projects, which frequently wants to find element by tag, id, attribute, etc. Most of the relevant tokens occur as string literals, and the code could intern them by simply switching to 'token
or :token
. If the DOM model could then support both symbols for "known tokens" and strings for "unknown tokens", it could save lots of string operations for those projects.
Have you actually determined that string operations are a performance problem for you or is this just empty guessing
Performance is one thing, it just doesn't feel as good to write "class"
everywhere, comparing to 'class
or :class
, wink wink.
That is not a good reason to make any of those changes.
You'll get over that feeling before long.
It seems to me that uninterned strings are just slightly suboptimal and less Lispy than using symbols or keyword symbols. Is there a rationale for using strings?
Is it desirable to move from strings to symbols? If so I could work on a patch. There might also be complications about backward compatibility...