The last part of analyzing the scan results is creating new rules (automatically) to add to the existing
repository of rules, and even attempts semi-automated .yml file generation.
Rule Generation
Grouping the license detections by location, and essentially, at last, these are the boundaries (start and end) of the matched text, and by stitching all the matched texts together from all these license detections we get the whole text “query”, which is almost always the Rule text to be added. So, along with keeping track of the boundaries of texts where license detection takes place, we also stitch the matched texts together one by one and discarding those already present in a larger text, in order to generate the final Rule text.
.yml generation
Almost always “license_expression” has to be entered manually, as it is complicated and requires a lot of contexts. These tasks can be sped up significantly by using a GUI based interactive review framework (like in the license tags part), this also takes into account present rule names, where they are numbered sequentially, so conflicts are avoided.
The last part of analyzing the scan results is creating new rules (automatically) to add to the existing repository of rules, and even attempts semi-automated .yml file generation.
Grouping the license detections by location, and essentially, at last, these are the boundaries (start and end) of the matched text, and by stitching all the matched texts together from all these license detections we get the whole text “query”, which is almost always the Rule text to be added. So, along with keeping track of the boundaries of texts where license detection takes place, we also stitch the matched texts together one by one and discarding those already present in a larger text, in order to generate the final Rule text.
Almost always “license_expression” has to be entered manually, as it is complicated and requires a lot of contexts. These tasks can be sped up significantly by using a GUI based interactive review framework (like in the license tags part), this also takes into account present rule names, where they are numbered sequentially, so conflicts are avoided.