aboutcode-org / scancode-analyzer

scancode-results-analyzer
4 stars 2 forks source link

Automatic Rule/.yml file Generation #15

Open AyanSinhaMahapatra opened 4 years ago

AyanSinhaMahapatra commented 4 years ago

The last part of analyzing the scan results is creating new rules (automatically) to add to the existing repository of rules, and even attempts semi-automated .yml file generation.

  1. Rule Generation

Grouping the license detections by location, and essentially, at last, these are the boundaries (start and end) of the matched text, and by stitching all the matched texts together from all these license detections we get the whole text “query”, which is almost always the Rule text to be added. So, along with keeping track of the boundaries of texts where license detection takes place, we also stitch the matched texts together one by one and discarding those already present in a larger text, in order to generate the final Rule text.

  1. .yml generation

Almost always “license_expression” has to be entered manually, as it is complicated and requires a lot of contexts. These tasks can be sped up significantly by using a GUI based interactive review framework (like in the license tags part), this also takes into account present rule names, where they are numbered sequentially, so conflicts are avoided.