aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.13k stars 550 forks source link

Detect RPM specfiles #171

Open rakeshbalusa opened 8 years ago

rakeshbalusa commented 8 years ago

Currently scancode-toolkit can scan packages or any kind of code without targeting metadata like type of package (for example : RPM). For any given RPM the .spec has most of the metadata like architecture, source RPM, URL etc., and it is the most reliable metadata for that RPM. So, we need to have a separate scanning scenarios for RPMs. Everytime scancode enounters an RPM it should provide scan results using these scenarios.

pombredanne commented 8 years ago

@rakeshbalusa actually the .spec file is rather hard to parse short of running the tools to build an RPM proper. The more reliable source of metadata is the RPM headers in an .rpm file. The .spec file itself is rarely available anyway when you deal with built RPMs.

rakeshbalusa commented 8 years ago

@pombredanne I guess we have to use those headers info instead of .spec file in case of an RPM.

balusarakesh commented 8 years ago

@pombredanne I think the line - https://github.com/nexB/scancode-toolkit/blob/develop/src/packagedcode/recognize.py#L57 should be if location.endswith(tuple(package.extensions)):. I mean we should look for extensions not metafiles, but this will be a problem for NPM packages as they have package.json as metafile which we must look for. I'm not sure how to solve this issue.

pombredanne commented 7 years ago

We are scanning RPMs alright and there is no plan for now to scan .spec files yet, though I am renaming this issue to do that