eclipse-dash / dash-licenses

Extract license information from content.
http://projects.eclipse.org/projects/technology.dash
Eclipse Public License 2.0
48 stars 33 forks source link

ERROR ClearlyDefined data search time out; maybe decrease batch size #128

Open phkrief opened 2 years ago

phkrief commented 2 years ago

Hi, On some repos the analysis replies the following error: [main] INFO Querying Eclipse Foundation for license data for 1000 items. [main] INFO Found 84 items. [main] INFO Querying Eclipse Foundation for license data for 663 items. [main] INFO Found 57 items. [main] INFO Querying ClearlyDefined for license data for 1000 items. [main] ERROR ClearlyDefined data search time out; maybe decrease batch size. [main] INFO Querying ClearlyDefined for license data for 522 items. [main] ERROR ClearlyDefined data search time out; maybe decrease batch size.

This error can be reproduced with the following repos:

I tried to "play" with the -timeout option w/o success. Any help is welcome

Thx a lot

phkrief commented 2 years ago

I just noticed that the issue happens only with Yarn.lock files. Here is how I call DASH for Yarn files:

java -jar org.eclipse.dash.licenses-0.0.1-SNAPSHOT.jar yarn.lock -summary $ANALYSIS_RESULT_FILE

Thx

phkrief commented 2 years ago

I found how to fix it. I used the -batch option and set it up to 500... And it worked great!! Sorry for the trouble

waynebeaton commented 2 years ago

Note that there is a rate limit on calls to the ClearlyDefined API. You can be rejected if you call it too many times. But you'd have to call it 100's of times an hour. Could this be the problem?

I have noticed that sometimes the API calls to ClearlyDefined just fail. I'll see what I can do to see if we can sort out the actual cause and do a better job of reporting the actual problem.

marcdumais-work commented 2 years ago

FWIW, in our experimentation, setting-up a dash-licenses run as part of Theia's CI, we did encounter this issue. It's possible that we had gone over the hourly limit, not sure. We ended-up using a -batch of 50, which seems to work consistently ok.

phkrief commented 2 years ago

@waynebeaton It's true, I test my scripts so I run them several times and I certainly call too many times ClearlyDefined APIs. But, I can't tell you if that's the reason. Anyway, since I set the batch option to 500, I don't have this problem anymore.