token() is the worst offender with 18s (number crunching)
parse_trlc_files() takes around 9s once you remove the lexing (which likely seems unavoidable)
and process() takes 4 seconds, which is entirely due to resolve_record_references (unavoidable, this is work that needs to happen sooner or later)
There are some immediate ideas:
[x] is_alpha, is_alum, and is_digit could be replaced by more builtiny functions (but we need to take care of unicode stuff, so it's not as easy as just using the builtins)
[x] implement #47
[ ] implement #48
[ ] token() could be optimised in some other way
[ ] token() could be replaced by a hand-written c lexer (but this adds portability concerns)
There is one more issue that could manifest on windows with large repos: if you have millions of files (most of which are not trlc files) then the initial traversal for register_dir could take a lot of time.
The worst offenders are for
tests-system/bulk
are:This is not unexpected:
There are some immediate ideas:
There is one more issue that could manifest on windows with large repos: if you have millions of files (most of which are not trlc files) then the initial traversal for register_dir could take a lot of time.