A performance regression was observed when parsing large strings after the merging of #3005.
Parsing time for the schema registry layers (which has large documentation strings) went from ~4-5ms to over 7ms.
This was traced back to the String parsing grammar. One notable difference from the PEGTL implementation and the original implementation was that even though it permissively let any character be "escaped" by \, it was explicit in the grammar about those that would be considered valid for replacement by TfEscapeStringReplaceChar. This was unnecessary and appeared to slow down the parser. (Were the grammar ever to become less permissive, it also had a bug-- as OctDigit was incorrectly specified.) These unnecessary checks and rules have now been removed.
This change includes some additional optimizations and simplifications to the grammar.
Preferring PEGTL_NS::string (and the wrappers PEGTL_NS::two and PEGTL_NS::three) over sequences of individual PEGTL_NS::one characters.
Replacing opt_must with opt in the number parsing grammar. An early version of the PEGTL grammar had required plus<digit>. to be followed by another digit (likely via plus<digit>). Once it became optional (ie. plus became star), the must was spurious.
Some complexity was originally added to the comment parsing to minimize backtracking when matching comments, but we suspect the PEGTL:_NS::string matching will be efficient and simpler.
Description of Change(s)
A performance regression was observed when parsing large strings after the merging of #3005.
Parsing time for the schema registry layers (which has large documentation strings) went from ~4-5ms to over 7ms.
This was traced back to the
String
parsing grammar. One notable difference from the PEGTL implementation and the original implementation was that even though it permissively let any character be "escaped" by\
, it was explicit in the grammar about those that would be considered valid for replacement byTfEscapeStringReplaceChar
. This was unnecessary and appeared to slow down the parser. (Were the grammar ever to become less permissive, it also had a bug-- asOctDigit
was incorrectly specified.) These unnecessary checks and rules have now been removed.This change includes some additional optimizations and simplifications to the grammar.
PEGTL_NS::string
(and the wrappersPEGTL_NS::two
andPEGTL_NS::three
) over sequences of individualPEGTL_NS::one
characters.opt_must
withopt
in the number parsing grammar. An early version of the PEGTL grammar had requiredplus<digit>.
to be followed by another digit (likely viaplus<digit>
). Once it became optional (ie.plus
becamestar
), themust
was spurious.PEGTL:_NS::string
matching will be efficient and simpler.Fixes Issue(s)
-