Split YR_MAX_ATOM_LENGTH for string and re, and optimize the extract algorithm of regular atom to speed up the scan.

Split YR_MAX_ATOM_LENGTH to YR_MAX_ATOM_RE_LENGTH and YR_MAX_ATOM_LENGTH will speed up the scan. For string, its length could increase from 4 to more large e.g. 8, and this change can speed up the scan. For re, if we just increase the length of re atom, the number atoms may increases exponentially because of wildcards and the memory will explode , so we optimized re atom extract algorithm. On the one hand, increase string YR_MAX_ATOM_LENGTH to speed up the scan. On the other hand, control the selected re atom contains at most one wildcard. After finishing atom extract on line 1164 of atoms.c , we count the number of wildcards in the re atom. If the count above 1, we will further extract from atom, the following is the explain.

find best child atom in every length from the atom, the range of length is [1, YR_MAX_ATOM_RE_LENGTH]. Every best child atom must meet 2 requirements, first it is the best quality, second the number of wildcards is below 1.
select one highest quality atom that dost not contain any wildcard (no-wildcard atom) and save it in best_atom[0].
select one highest quality atom that contains half wildcard ( half-wildcard atom) and save in best_atom[1]. For example, {01 0? 03} contains half wildcard.
select one highest quality atom that contains one wildcard (one-wildcard) and save in best_atom[2].
To avoid no-wildcard atom and half-wildcard atom length is too short, we set YR_MIN_VALID_ATOM_LENGTH. If no-wildcard atom length greater than or equal to YR_MIN_VALID_ATOM_LENGTH, we will select it as the final best atom and stop the select algorithm. Or else if the half-wildcard atom length greater than or equal to YR_MIN_VALID_ATOM_LENGTH+1, we will select it as the final best atom and stop the select algorithm. Or else we will choose the longest atom as the best atom. By following the above optimization, the scanning speed was increased by 6 times in our test.

VirusTotal / yara

Split YR_MAX_ATOM_LENGTH for string and re, and optimize the extract algorithm of regular atom to speed up the scan. #2106