aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.15k stars 553 forks source link

Collect data from Yocto/bitbake .bb manifest files #1243

Open pombredanne opened 6 years ago

pombredanne commented 6 years ago

The bitbake manifests commonly contain decent license and origin information.

pombredanne commented 6 years ago

See also https://gist.github.com/craigez/df6bcd275440ee0047d020511a3d4532#bitbake-recipes-and-yocto-distributions by @craigez

craigez commented 6 years ago

I have some little scripts that dump declared licenses, origins, etc... let me know if that would be useful.

pombredanne commented 6 years ago

@craigez that would be great! Even if this is not in Python that will be useful to create a new manifest parser in https://github.com/nexB/scancode-toolkit/tree/develop/src/packagedcode

craigez commented 6 years ago

I actually insert it into the bitbake process to dump the information. This requires updating the configuration to include this build step.

pombredanne commented 6 years ago

@craigez that's fine I guess, especially since this is Python code. I am also curious on how you plugin some code in the bitbake process BTW ... as this can be useful for tracing, licensing scanning, etc.

craigez commented 6 years ago

@pombredanne So, here's an example of the type of script I've used before to extract data for one-off investigations. It was based on the existing spdx.bbclass that uses fossology, but this just dumps out information without scanning: https://gist.github.com/craigez/607ec343a7a08f3b465c7307377ae84a

And here's a hacky way to get it to run: https://gist.github.com/craigez/066285753b69139b9d98b0fdd17ae56a

You can add other bitbake fields like 'license', etc... to get that information into the CSV

Mark and I were also looking at how to incorporate scanning, etc... https://github.com/mcharleb/meta-srcmap (this is also how you properly add a layer...)

pombredanne commented 6 years ago

@craigez Thanks you++ for taking the time to put these together ... But this means this is used while actually running a build, correct?

That would be more of a job for TraceCode then IMHO.

ScanCode only does static analysis and manifest parsing... so I was more thinking about something that can collect information from .bb files found in a codebase at rest.

craigez commented 6 years ago

@pombredanne I wanted to do something like that too, but I couldn't find a pre-existing python library to parse the files. It really needs to interpret them though anyway, as files include other files and there are variables defined during the build, etc... So it was easier/quicker for my use case to just add a build step and then use bitbake to run it (without compilation, linking steps, etc...)

pombredanne commented 4 years ago

Some extra pointers for some minimal implementation of a static analysis:

pombredanne commented 4 years ago

@priv-kweihmann FWIW, since I have your attention this ticket is about adding a parser for .bb file metadata so that we can collect them as part of a scancode "package" scan e.g. collect and normalize the bitbake recipe data found in a .bb file to the models we use here: https://github.com/nexB/scancode-toolkit/tree/develop/src/packagedcode and https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/models.py

Your linter likely provides a decent base for parsing (though I shall say that I feel it may have too many dependencies if I use only the parser ;) https://github.com/priv-kweihmann/oelint-adv/blob/master/requirements.txt )

pombredanne commented 4 years ago

@chinyeungli ping... Would this be something you can look into?

priv-kweihmann commented 4 years ago

@priv-kweihmann FWIW, since I have your attention this ticket is about adding a parser for .bb file metadata so that we can collect them as part of a scancode "package" scan e.g. collect and normalize the bitbake recipe data found in a .bb file to the models we use here: https://github.com/nexB/scancode-toolkit/tree/develop/src/packagedcode and https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/models.py

Your linter likely provides a decent base for parsing (though I shall say that I feel it may have too many dependencies if I use only the parser ;) https://github.com/priv-kweihmann/oelint-adv/blob/master/requirements.txt )

FWIW if you're not obligated by licensing I would actually recommend to fork the original bitbake parser (which is GPL2) - but sure I can life with using mine too - btw the dependencies shouldn't apply to the parser itself - they are just used for the rules.

pombredanne commented 4 years ago

@priv-kweihmann you wrote:

FWIW if you're not obligated by licensing I would actually recommend to fork the original bitbake parser (which is GPL2) -

ScanCode is Apache-licensed so that would be an issue. The alternative would be use the original bitbake parser as a command utility (assuming it could be spitting some JSON or similar structure format) But a quick test shows that the bitbake parser is unusable to parse a single recipe file. I took http://cgit.openembedded.org/openembedded-core/plain/meta/recipes-core/dropbear/dropbear.inc and saved it in dropbear_2.1.bb:

$ python 
Python 3.6.10 (default, Jun 13 2020, 08:53:46) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from bb import parse
>>> d = bb.data.init()
>>> bb.parse.siggen = bb.siggen.init(d)
>>> f='dropbear_2.1.bb'
>>> recipe =bb.parse.handle(f,d)
Traceback (most recent call last):
  File "/tmp/bitbake/lib/bb/parse/parse_py/ConfHandler.py", line 93, in include_single_file
    bb.parse.handle(fn, data, True)
  File "/tmp/bitbake/lib/bb/parse/__init__.py", line 107, in handle
    return h['handle'](fn, data, include)
  File "/tmp/bitbake/lib/bb/parse/parse_py/BBHandler.py", line 117, in handle
    abs_fn = resolve_file(fn, d)
  File "/tmp/bitbake/lib/bb/parse/__init__.py", line 125, in resolve_file
    raise IOError(errno.ENOENT, "file %s not found in %s" % (fn, bbpath))
FileNotFoundError: [Errno 2] file classes/autotools.bbclass not found in None

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/bitbake/lib/bb/parse/__init__.py", line 107, in handle
    return h['handle'](fn, data, include)
  File "/tmp/bitbake/lib/bb/parse/parse_py/BBHandler.py", line 127, in handle
    statements.eval(d)
  File "/tmp/bitbake/lib/bb/parse/ast.py", line 19, in eval
    statement.eval(data)
  File "/tmp/bitbake/lib/bb/parse/ast.py", line 274, in eval
    bb.parse.BBHandler.inherit(self.classes, self.filename, self.lineno, data)
  File "/tmp/bitbake/lib/bb/parse/parse_py/BBHandler.py", line 66, in inherit
    include(fn, file, lineno, d, "inherit")
  File "/tmp/bitbake/lib/bb/parse/parse_py/ConfHandler.py", line 70, in include
    include_single_file(parentfn, fn, lineno, data, error_out)
  File "/tmp/bitbake/lib/bb/parse/parse_py/ConfHandler.py", line 97, in include_single_file
    raise ParseError("Could not %s file %s" % (error_out, fn), parentfn, lineno)
bb.parse.ParseError: ParseError at dropbear_2.1.bb:37: Could not inherit file classes/autotools.bbclass

e.g. I would need to have a full installation with classes and more to make this parser usable which is practically impossible at scale and in the practical context of scanning codebases.

pombredanne commented 4 years ago

The only way out with the regular bitbake parser would be to comment out all the "inherit" directives. Then the output of the part of interest (e.g. variable declarations that are not super hard to parse) do not even have variables ${} resolved so that not super useful. That said I am not sure how I could use your own parser.

priv-kweihmann commented 4 years ago

I see, in that case it surely won't make much sense to go with the official parser

pombredanne commented 4 years ago

@priv-kweihmann could I interest you in extracting the parsing parts from https://github.com/priv-kweihmann/oelint-adv as their own library?

priv-kweihmann commented 4 years ago

Sure when you write me a ticket

pombredanne commented 4 years ago

@priv-kweihmann sure thing,

Here is a snippet that I used FWIW

from oelint_adv.cls_stash import Stash
from oelint_adv.parser import get_items

class Args(object):
    def __init__(self):
        self.quiet = True

s = Stash(args=Args())
f = 'dropbear_2.1.bb'
items = get_items(s, f)
for item in items:
  for key, value in item.GetAttributes().items():
    print(key, ':', value)

Line : 1
Raw : SUMMARY = "A lightweight SSH and SCP implementation"

Links : []
Origin : dropbear_2.1.bb
InFileLine : 1
IncludedFrom : []
VarName : SUMMARY
SubItem : 
PkgSpec : []
SubItems : []
VarValue : "A lightweight SSH and SCP implementation"
VarOp :  = 
Flag : 
VarValueStripped : A lightweight SSH and SCP implementation
Line : 2
Raw : HOMEPAGE = "http://matt.ucc.asn.au/dropbear/dropbear.html"

Links : []
Origin : dropbear_2.1.bb
InFileLine : 2
IncludedFrom : []
VarName : HOMEPAGE
SubItem : 
PkgSpec : []
SubItems : []
VarValue : "http://matt.ucc.asn.au/dropbear/dropbear.html"
VarOp :  = 
Flag : 
VarValueStripped : http://matt.ucc.asn.au/dropbear/dropbear.html
.....
pombredanne commented 4 years ago

https://github.com/priv-kweihmann/oelint-adv/issues/183 created

chinyeungli commented 4 years ago

@chinyeungli ping... Would this be something you can look into?

Although, imo, you are in a much better position than I am, I can try :p

pombredanne commented 4 years ago

Some extra links https://www.yoctoproject.org/docs/3.1.3/dev-manual/dev-manual.html#working-with-licenses

The Yocto Project generates a license manifest during image creation that is located in ${DEPLOY_DIR}/licenses/image_name-datestamp to assist with any audits.

pombredanne commented 2 years ago

See also: