clearlydefined / crawler

A service that crawls projects and packages for information relevant to ClearlyDefined
MIT License
48 stars 33 forks source link

Files with license info not being detected #406

Open ariel11 opened 3 years ago

ariel11 commented 3 years ago

Here's an example of a file with the MIT license (https://clearlydefined.io/file/2893c762b244363021756d2bfa004c1402eb641a7b02a4cea8bd37ebddcf68c1) where ClearllyDefined did not flag it as a file with license info and therefore, did not include "MIT" in the "discovered" field.

image

Unclear why scancode wasn't run on this?

Could also get license info by following "Source" link to commit - LICENSE file = MIT.

ariel11 commented 3 years ago

Here's another example where the license info on the setup.py was not detected

setup.py - https://clearlydefined.io/file/bdc6a2cd7c7d2450efacf5b0f3124118e7f8de6671bde4ae7d4ae2654a270858. https://clearlydefined.io/definitions/pypi/pypi/-/astroid/2.3.3.

image

ariel11 commented 3 years ago

Another example where license info on files is not being detected. Adding @pombredanne for thoughts on scancode.

FYI @nellshamrell

nellshamrell commented 3 years ago

Queued up a pypi/pypi/-/asteroid harvest in my local environment - looks like it's encountering an error:

service_1                    | GET /definitions/pypi/pypi/-/astroid/2.3.3?expand=prs&matchCasing=false 200 605.385 ms - 471
crawler_1                    | POST /requests 201 0.910 ms - 7
crawler_1                    | [I] Traversed component@cd:/pypi/pypi/-/astroid/2.3.3  {"loopName":"0","cid":"9o","root":"self","outcome":"Traversed","time":2,"crawlerId":"da62d530-3d93-4de8-936b-098a9b84bf2e","buildNumber":"0"}
crawler_1                    | [I] Traversed package@cd:/pypi/pypi/-/astroid/2.3.3  {"loopName":"0","cid":"9p","root":"component@cd:/pypi/pypi/-/astroid/2.3.3","outcome":"Traversed","time":0,"crawlerId":"da62d530-3d93-4de8-936b-098a9b84bf2e","buildNumber":"0"}
crawler_1                    | SourceDiscovery provider could not be found for https://pypi.org/project/astroid/
crawler_1                    | [I] Processed pypi@cd:/pypi/pypi/-/astroid/2.3.3  {"loopName":"0","cid":"9q","root":"component@cd:/pypi/pypi/-/astroid/2.3.3","k":1657,"count":228,"write":7,"outcome":"Processed","time":2075,"crawlerId":"da62d530-3d93-4de8-936b-098a9b84bf2e","buildNumber":"0"}
crawler_1                    | [I] Processed licensee@cd:/pypi/pypi/-/astroid/2.3.3  {"loopName":"0","cid":"9r","root":"component@cd:/pypi/pypi/-/astroid/2.3.3","write":2,"outcome":"Processed","time":3086,"crawlerId":"da62d530-3d93-4de8-936b-098a9b84bf2e","buildNumber":"0"}
crawler_1                    | [I] Analyzing scancode@cd:/pypi/pypi/-/astroid/2.3.3 using ScanCode. input: /tmp/cd-lXAshv output: /tmp/cd-nviXNU {"crawlerId":"da62d530-3d93-4de8-936b-098a9b84bf2e","buildNumber":"0"}
nellshamrell commented 3 years ago

I didn't get that error when running a local harvest of nuget/nuget/-/CsvHelper/2.16.0. However, I was able to replicate the behavior @ariel11 was seeing. Declaring this reproducible.