aboutcode-org / commoncode

A library of common functions shared in many other AboutCode projects
3 stars 11 forks source link

Rethinking the relationship between Codebases and Resources #35

Open JonoYang opened 2 years ago

JonoYang commented 2 years ago

It is not easy to determine a root for a codebase when a scan contains many different codebases within it.

A way to resolve the issue about not being able to determine a root for a codebase in a scan with multiple codebases is to consider the group of codebases as a Project, similar to what we do in scancode.io

A Project would then have multiple "starting paths" that would be the individual roots of the different codebases in a scan. A Project would keep track of the leading path segment of Resources as a "starting path".

For example, consider that we have the following Resource paths in a scan:

codebase1/a.c
codebase2/foo.c
codebase3/do.c

In this case, the Project would track "codebase1", "codebase2", and "codebase3" as starting paths.

If the input to a Project is a file, then the starting path will just be the file name of the single file.