Closed zygoloid closed 1 month ago
Note, in consideration for my comments on #4000, I would suggest considering the Define comments as they apply here.
Note, in consideration for my comments on #4000, I would suggest considering the Define comments as they apply here.
They don't apply here -- this PR doesn't change Define
now that the dependent PR has landed.
Had you considered instead tracking the TokenKind as part of the NodeKind definition? What are we getting by creating the separate verification path?
For me, listing the token kind at the right place in the struct makes the typed node definitions significantly more useful and readable: these structs can then be read directly as grammar productions. For example:
struct MemberAccessExpr {
static constexpr auto Kind = NodeKind::MemberAccessExpr.Define(
{.category = NodeCategory::Expr, .child_count = 2});
AnyExprId lhs;
Token<Lex::TokenKind::Period> token;
AnyMemberNameOrMemberExprId rhs;
};
corresponds to the grammar rule
member-access-expr = any-expr "." any-member-name-or-member-expr
which we can now read directly from the struct members.
Instead of having a Token<K>
wrapping a TokenIndex
, we could consider having separate TokenIndex
-like types for each kind of token index that has a known token kind, in the same way we do for NodeId
. I'm not sure that we would get enough benefit from having more strongly typed TokenIndex
es to justify the costs there, but I'll prototype it and see how it looks.
Instead of having a
Token<K>
wrapping aTokenIndex
, we could consider having separateTokenIndex
-like types for each kind of token index that has a known token kind, in the same way we do forNodeId
. I'm not sure that we would get enough benefit from having more strongly typedTokenIndex
es to justify the costs there, but I'll prototype it and see how it looks.
Here is a prototype of how that might look.
FWIW, for me, the prototype reads better to me: compared with NamedConstraintDefinitionStartId
, Lex::CloseCurlyBraceTokenIndex
also feels more consistent style-wise than Token<Lex::TokenKind::CloseCurlyBrace>
. The amount of code looks similar.
I agree we don't want to spend more compute on lots of strong-typing (though maybe for this area, where we're already doing it for parse nodes, it's actually okay to do it for tokens too?).
To note, for IfValid, maybe something like:
// A lightweight handle to a lexed token in a `TokenizedBuffer` whose kind is
// known to be `Kind`.
template <const TokenKind& K>
struct TokenIndexForKindBase : public TokenIndex {
// NOLINTNEXTLINE(readability-identifier-naming)
static const TokenKind& Kind;
constexpr explicit TokenIndexForKind(TokenIndex index) : TokenIndex(index) {}
};
template <const TokenKind& K>
const TokenKind& TokenIndexForKindBase<K>::Kind = K;
// Used when the kind is always `Kind`.
template <const TokenKind& K>
struct TokenIndexForKind : TokenIndexForKindBase<K> {
static constexpr bool IfValid = false;
};
// Used when the kind is only required to be `Kind` when the corresponding parse node doesn't have an error.
template <const TokenKind& K>
struct TokenIndexForKindIfValid : TokenIndexForKindBase<K> {
static constexpr bool IfValid = true;
};
#define CARBON_TOKEN(TokenName) \
using TokenName##TokenIndex = TokenIndexForKind<TokenKind::TokenName>;
using TokenName##TokenIndexIfValid = TokenIndexForKindIfValid<TokenKind::TokenName>;
#include "toolchain/lex/token_kind.def"
Also, I'd also suggest doing something with AnyToken
, not sure if you just ignored it as part of the prototype. Either renaming AnyTokenIndex
for consistency, or just using TokenIndex
directly to avoid the extra typedef.
How would you feel about proceeding with this PR with the Token<K>
approach, and adding the typed TokenIndex
es as a follow-up change? I think there's more cleanup to do before that'll be ready to go, and I'm worried about this backsliding.
Instead of tracking the token associated with a parse node in the
.def
file macro, track it on the typed node instead. List the token as a field inside the node structure to show the order of the token relative to the other components of the grammar production, and to allow the token index to be accessed when the node is extracted.Remove the corresponding information from the
.def
file, leaving behind just a list of parse node kinds in the majority of cases.This also removes the checking of the token kind associated with a parse node in the case where the parse node has errors. Previously we had a flag on the node kind to indicate whether we should check this, but per discord discussion, we have decided to remove this.