Minor optimization possibility: don't process the generated builtin declarations header completely three times

Explanation from WebCLValidator::run() from PR #58

/*
 * Feed the builtin declarations produced by the preprocessing stage
 * to be included when running the later stages. This fixes issues
 * stemming from builtin functions being assumed to return int by default, etc.
 *
 * Not all builtin functions are declared, but only those which the preprocessing
 * stage detects as possibly having been called by the code being validated.
 *
 * TODO: profile, most importantly to see if:
 *
 *  1) expanding the _CL_DECLARE... macros takes a significant amount of
 *     time in the usual cases. In this case, we could capture the preprocessed
 *     version of the header produced during the first matcher stage and reuse it
 *     in the remaining matcher and validation stages.
 *
 * 2) parsing the literal function declarations produced by macro expansion
 *    is still expensive. The preprocessing stage can only narrow down the set
 *    of functions to declare by textual matching; this can provide false positives
 *    which we could eliminate by looking at the actual function calls in the AST
 *    once parsed.
 */

The minimum work we can get away with is to parse the header in the current form in matcher stage 1; ideally we could eliminate the overhead for the two remaining stages.

I experimented by only feeding the actual header to matcher 1 and an empty one to matcher 2 and validation (causes incorrect output but similar performance to the ideal). This sped up the test suite about 10%, so this is not very fruitful in comparison to #60.

KhronosGroup / webcl-validator

Minor optimization possibility: don't process the generated builtin declarations header completely three times #61