clearlydefined / crawler

A service that crawls projects and packages for information relevant to ClearlyDefined
MIT License
43 stars 30 forks source link

explore scancode.io as a replacement for including scancode-toolkit as a library #527

Open elrayle opened 7 months ago

elrayle commented 7 months ago

Description

Currently, nexB/scancode-toolkit is compiled into the clearlydefined/crawler Docker image and used as a library. As a result, a new image of the crawler has to be built whenever we want to upgrade to a later version of scancode-toolkit. Long term maintenance of the ClearlyDefined ecosystem would be easier if scancode is a separate service that runs from its own Docker image.

We are unable to use scancode-toolkit as a Docker image because it only provides command line access and does not have an API. It is possible that another tool, scancode.io, produced by nexB has an API that can be used as a Docker service to run scancode-toolkit.

Toward that end, explore nexB/scancode.io to determine if it has an API to run the scancode process and produces a Docker image that allows scancode to run as a separate service. If there isn't a Docker image, we can help create one and publish it to ghcr.io.

Related Work