clearlydefined / service

The service side of clearlydefined.io
MIT License
45 stars 40 forks source link

When requesting curations for a component that does not have them, receive an empty JSON response, rather than 404 #795

Closed nellshamrell closed 2 years ago

nellshamrell commented 3 years ago

A community member reported this in the ClearlyDefined Discord:

Hello ! Since friday, we have several errors with the Clearly Defined curation provider running in our tool, ORT.

08:27:43.098 [main] WARN  org.ossreviewtoolkit.analyzer.curation.ClearlyDefinedPackageCurationProvider - Getting curations for 'NPM:@webassemblyjs:wast-printer:1.8.5' failed with: HttpException: HTTP 404 
08:27:44.282 [main] WARN  org.ossreviewtoolkit.analyzer.curation.ClearlyDefinedPackageCurationProvider - Getting curations for 'NPM:@xtuc:ieee754:1.2.0' failed with: HttpException: HTTP 404 
08:27:45.395 [main] WARN  org.ossreviewtoolkit.analyzer.curation.ClearlyDefinedPackageCurationProvider - Getting curations for 'NPM:@xtuc:long:4.2.2' failed with: HttpException: HTTP 404 
08:27:46.598 [main] WARN  org.ossreviewtoolkit.analyzer.curation.ClearlyDefinedPackageCurationProvider - Getting curations for 'NPM:@yarnpkg:lockfile:1.1.0' failed with: HttpException: HTTP 404 
Found 1 project(s) in total.

I did some test with the Swagger page and I also get 404... Has something changed ? Shall I create a ticket ? Thanks a lot in advance

nellshamrell commented 3 years ago

I just did some queries in Azure App Insights to gain some more information.

It does look like 404s dramatically increased on Mon, February 8, UTC

requests
| where success == false and resultCode == 404
| order by timestamp asc
January 24 - 30 2021

1/24/2021, 12:00:00.000 AM  13  
1/25/2021, 12:00:00.000 AM  1,230   
1/26/2021, 12:00:00.000 AM  1,607   
1/27/2021, 12:00:00.000 AM  1,827   
1/28/2021, 12:00:00.000 AM  1,400   
1/29/2021, 12:00:00.000 AM  3,329   
1/30/2021, 12:00:00.000 AM  1,105

January 31 - February 7 2021

1/31/2021, 12:00:00.000 AM  23  
2/1/2021, 12:00:00.000 AM   3,889   
2/2/2021, 12:00:00.000 AM   1,523   
2/3/2021, 12:00:00.000 AM   5,132   
2/4/2021, 12:00:00.000 AM   3,128   
2/5/2021, 12:00:00.000 AM   2,393   
2/6/2021, 12:00:00.000 AM   1,432

February 7 - 8

2/8/2021, 12:00:00.000 AM   7,815   
2/7/2021, 12:00:00.000 AM   57
nellshamrell commented 3 years ago

Here are some of the top 404 requests (by number of calls/failures) in the last 24 hours:

(Query was run in Azure App Insights)

requests
| where success == false and resultCode == 404
| order by timestamp asc
| summarize failedCount=sum(itemCount), impactedUsers=dcount(user_Id) by operation_Name
| order by failedCount desc
  | GET /robots.txt | 10 | 1 |  
  | GET /curations/npm/npmjs/-/require-directory/2.1.1 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/which-module/2.0.0 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/set-blocking/2.0.0 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/string_decoder/0.10.31 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/chalk/2.4.2 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/asynckit/0.4.0 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/debug/2.6.9 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/isarray/0.0.1 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/combined-stream/1.0.8 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/verror/1.10.0 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/punycode/2.1.1 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/repeat-string/1.6.1 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/safer-buffer/2.1.2 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/readable-stream/1.1.14 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/is-typedarray/1.0.0 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/fs.realpath/1.0.0 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/color-convert/1.9.3 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/safe-buffer/5.1.2 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/balanced-match/1.0.0 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/qs/6.5.2 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/safe-buffer/5.2.1 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/isstream/0.1.2 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/supports-color/5.5.0 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/tunnel-agent/0.6.0 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/tweetnacl/0.14.5 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/which/1.3.1 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/lodash/4.17.20 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/has-flag/3.0.0 | 6 | 1 |  
  | GET /curations/npm/npmjs/-/decamelize/1.2.0

Investigating further...

nellshamrell commented 3 years ago

Did some investigating of some of the top requests returning 404s. Thing that is consistent - the API call is doing a GET /curations and none of these components have curations.

/curations/npm/npmjs/-/require-directory/2.1.1

curl https://api.clearlydefined.io/curations/npm/npmjs/-/require-directory/2.1.1

Not Found

/curations/npm/npmjs/-/which-module/2.0.0

curl https://api.clearlydefined.io/curations/npm/npmjs/-/which-module/2.0.0

Not Found

/curations/npm/npmjs/-/set-blocking/2.0.0

curl https://api.clearlydefined.io/curations/npm/npmjs/-/set-blocking/2.0.0

Not Found

/curations/npm/npmjs/-/string_decoder/0.10.31

curl https://api.clearlydefined.io/curations/npm/npmjs/-/string_decoder/0.10.31

Not Found
nellshamrell commented 3 years ago

As a point of comparison, I tried GET /curations with a component that does have curations:

curl https://api.clearlydefined.io/curations/nuget/nuget/-/JsonDiffPatch/2.0.49
{"licensed":{"declared":"Apache-2.0"}}

I don't know if anything changed, but I'm guessing that a GET /curations for a component that does not have curations should return an empty JSON response, rather than a 404. It feels like that would be less confusing.

sschuberth commented 3 years ago

Is there any consensus yet whether you're planning to continue using 404s from now, or whether you're switching back to using an empty JSON response (like it was before AFAIK)? Because depending on that we might need to make changes in ORT or not...

sschuberth commented 3 years ago

It does look like 404s dramatically increased on Mon, February 8, UTC

So, does that actually confirm that previously the behavior was different, and CD did return an empty JSON for non-existing curations? Or has CD always returned 404s for non-existing curations, and just the absolute number of requests to non-existing curations raised?

nellshamrell commented 3 years ago

I don't believe there were any changes to that area of the API - I think that, for some reason, we were getting a lot more requests for curations that did not exist on Monday. I believe it was that the absolute number of requests to non-existing curations raised on that day.

sschuberth commented 3 years ago

I see. Then feel free to close this issue. On the ORT side, we have adjusted our code to not warn about a 404 for non-existing curation data anymore, which seemingly we should have done right from the start. Thanks!