I would like to add the capability to use existing spdx documents as a sort of portable caching system.
Overview
Basically as DoSOCS runs on a package, it will look for existing SPDX documents in the package and "sub" packages. Then it could try to parse the SPDX document and pull in the information without having to scan the files. However if it doesn't find an SPDX doc then it will proceed as normal. This could include several SPDX documents in several different "sub packages"
Challenges
Parsing SPDX documents
Should we identify that the information was pulled in from an existing document?
Have to create parsers for every type of document format (Tag, RDF, json)
I think there may be some existing python parsing functionality out there.
Thoughts
At this point I am thinking it will just parse the document for the following items
File level information
Individual license information
Possibly creators and reviewer info
We will assume documents will be at the top of the directory structure for which they apply, so if a sub package is in a sub folder the SPDX document will be at the root of that sub folder.
I would like to add the capability to use existing spdx documents as a sort of portable caching system.
Overview
Basically as DoSOCS runs on a package, it will look for existing SPDX documents in the package and "sub" packages. Then it could try to parse the SPDX document and pull in the information without having to scan the files. However if it doesn't find an SPDX doc then it will proceed as normal. This could include several SPDX documents in several different "sub packages"
Challenges
Thoughts