Open pombredanne opened 6 years ago
I have some little scripts that dump declared licenses, origins, etc... let me know if that would be useful.
@craigez that would be great! Even if this is not in Python that will be useful to create a new manifest parser in https://github.com/nexB/scancode-toolkit/tree/develop/src/packagedcode
I actually insert it into the bitbake process to dump the information. This requires updating the configuration to include this build step.
@craigez that's fine I guess, especially since this is Python code. I am also curious on how you plugin some code in the bitbake process BTW ... as this can be useful for tracing, licensing scanning, etc.
@pombredanne So, here's an example of the type of script I've used before to extract data for one-off investigations. It was based on the existing spdx.bbclass that uses fossology, but this just dumps out information without scanning: https://gist.github.com/craigez/607ec343a7a08f3b465c7307377ae84a
And here's a hacky way to get it to run: https://gist.github.com/craigez/066285753b69139b9d98b0fdd17ae56a
You can add other bitbake fields like 'license', etc... to get that information into the CSV
Mark and I were also looking at how to incorporate scanning, etc... https://github.com/mcharleb/meta-srcmap (this is also how you properly add a layer...)
@craigez Thanks you++ for taking the time to put these together ... But this means this is used while actually running a build, correct?
That would be more of a job for TraceCode then IMHO.
ScanCode only does static analysis and manifest parsing... so I was more thinking about something that can collect information from .bb files found in a codebase at rest.
@pombredanne I wanted to do something like that too, but I couldn't find a pre-existing python library to parse the files. It really needs to interpret them though anyway, as files include other files and there are variables defined during the build, etc... So it was easier/quicker for my use case to just add a build step and then use bitbake to run it (without compilation, linking steps, etc...)
Some extra pointers for some minimal implementation of a static analysis:
${}
which is something we also do for Maven POMs@priv-kweihmann FWIW, since I have your attention this ticket is about adding a parser for .bb file metadata so that we can collect them as part of a scancode "package" scan e.g. collect and normalize the bitbake recipe data found in a .bb file to the models we use here: https://github.com/nexB/scancode-toolkit/tree/develop/src/packagedcode and https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/models.py
Your linter likely provides a decent base for parsing (though I shall say that I feel it may have too many dependencies if I use only the parser ;) https://github.com/priv-kweihmann/oelint-adv/blob/master/requirements.txt )
@chinyeungli ping... Would this be something you can look into?
@priv-kweihmann FWIW, since I have your attention this ticket is about adding a parser for .bb file metadata so that we can collect them as part of a scancode "package" scan e.g. collect and normalize the bitbake recipe data found in a .bb file to the models we use here: https://github.com/nexB/scancode-toolkit/tree/develop/src/packagedcode and https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/models.py
Your linter likely provides a decent base for parsing (though I shall say that I feel it may have too many dependencies if I use only the parser ;) https://github.com/priv-kweihmann/oelint-adv/blob/master/requirements.txt )
FWIW if you're not obligated by licensing I would actually recommend to fork the original bitbake parser (which is GPL2) - but sure I can life with using mine too - btw the dependencies shouldn't apply to the parser itself - they are just used for the rules.
@priv-kweihmann you wrote:
FWIW if you're not obligated by licensing I would actually recommend to fork the original bitbake parser (which is GPL2) -
ScanCode is Apache-licensed so that would be an issue. The alternative would be use the original bitbake parser as a command utility (assuming it could be spitting some JSON or similar structure format)
But a quick test shows that the bitbake parser is unusable to parse a single recipe file.
I took http://cgit.openembedded.org/openembedded-core/plain/meta/recipes-core/dropbear/dropbear.inc and saved it in dropbear_2.1.bb
:
$ python
Python 3.6.10 (default, Jun 13 2020, 08:53:46)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from bb import parse
>>> d = bb.data.init()
>>> bb.parse.siggen = bb.siggen.init(d)
>>> f='dropbear_2.1.bb'
>>> recipe =bb.parse.handle(f,d)
Traceback (most recent call last):
File "/tmp/bitbake/lib/bb/parse/parse_py/ConfHandler.py", line 93, in include_single_file
bb.parse.handle(fn, data, True)
File "/tmp/bitbake/lib/bb/parse/__init__.py", line 107, in handle
return h['handle'](fn, data, include)
File "/tmp/bitbake/lib/bb/parse/parse_py/BBHandler.py", line 117, in handle
abs_fn = resolve_file(fn, d)
File "/tmp/bitbake/lib/bb/parse/__init__.py", line 125, in resolve_file
raise IOError(errno.ENOENT, "file %s not found in %s" % (fn, bbpath))
FileNotFoundError: [Errno 2] file classes/autotools.bbclass not found in None
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/bitbake/lib/bb/parse/__init__.py", line 107, in handle
return h['handle'](fn, data, include)
File "/tmp/bitbake/lib/bb/parse/parse_py/BBHandler.py", line 127, in handle
statements.eval(d)
File "/tmp/bitbake/lib/bb/parse/ast.py", line 19, in eval
statement.eval(data)
File "/tmp/bitbake/lib/bb/parse/ast.py", line 274, in eval
bb.parse.BBHandler.inherit(self.classes, self.filename, self.lineno, data)
File "/tmp/bitbake/lib/bb/parse/parse_py/BBHandler.py", line 66, in inherit
include(fn, file, lineno, d, "inherit")
File "/tmp/bitbake/lib/bb/parse/parse_py/ConfHandler.py", line 70, in include
include_single_file(parentfn, fn, lineno, data, error_out)
File "/tmp/bitbake/lib/bb/parse/parse_py/ConfHandler.py", line 97, in include_single_file
raise ParseError("Could not %s file %s" % (error_out, fn), parentfn, lineno)
bb.parse.ParseError: ParseError at dropbear_2.1.bb:37: Could not inherit file classes/autotools.bbclass
e.g. I would need to have a full installation with classes and more to make this parser usable which is practically impossible at scale and in the practical context of scanning codebases.
The only way out with the regular bitbake parser would be to comment out all the "inherit" directives.
Then the output of the part of interest (e.g. variable declarations that are not super hard to parse) do not even have variables ${}
resolved so that not super useful. That said I am not sure how I could use your own parser.
I see, in that case it surely won't make much sense to go with the official parser
@priv-kweihmann could I interest you in extracting the parsing parts from https://github.com/priv-kweihmann/oelint-adv as their own library?
Sure when you write me a ticket
@priv-kweihmann sure thing,
Here is a snippet that I used FWIW
from oelint_adv.cls_stash import Stash
from oelint_adv.parser import get_items
class Args(object):
def __init__(self):
self.quiet = True
s = Stash(args=Args())
f = 'dropbear_2.1.bb'
items = get_items(s, f)
for item in items:
for key, value in item.GetAttributes().items():
print(key, ':', value)
Line : 1
Raw : SUMMARY = "A lightweight SSH and SCP implementation"
Links : []
Origin : dropbear_2.1.bb
InFileLine : 1
IncludedFrom : []
VarName : SUMMARY
SubItem :
PkgSpec : []
SubItems : []
VarValue : "A lightweight SSH and SCP implementation"
VarOp : =
Flag :
VarValueStripped : A lightweight SSH and SCP implementation
Line : 2
Raw : HOMEPAGE = "http://matt.ucc.asn.au/dropbear/dropbear.html"
Links : []
Origin : dropbear_2.1.bb
InFileLine : 2
IncludedFrom : []
VarName : HOMEPAGE
SubItem :
PkgSpec : []
SubItems : []
VarValue : "http://matt.ucc.asn.au/dropbear/dropbear.html"
VarOp : =
Flag :
VarValueStripped : http://matt.ucc.asn.au/dropbear/dropbear.html
.....
@chinyeungli ping... Would this be something you can look into?
Although, imo, you are in a much better position than I am, I can try :p
Some extra links https://www.yoctoproject.org/docs/3.1.3/dev-manual/dev-manual.html#working-with-licenses
The Yocto Project generates a license manifest during image creation that is located in ${DEPLOY_DIR}/licenses/image_name-datestamp to assist with any audits.
See also:
The bitbake manifests commonly contain decent license and origin information.