clearlydefined / service

The service side of clearlydefined.io
MIT License
45 stars 40 forks source link

Failed to fetch harvest #688

Open pombredanne opened 4 years ago

pombredanne commented 4 years ago

with https://api.clearlydefined.io/harvest/git/github/alpinelinux/aports/b8229586a20eea5fc0f782a383ef666b6a60e054/scancode I can then go to: https://api.clearlydefined.io/harvest/git/github/alpinelinux/aports/b8229586a20eea5fc0f782a383ef666b6a60e054/scancode/3.2.2 and it either never completes or ends with:

Error 524 Ray ID: 5437e5ec6b61c867 • 2019-12-11 13:46:57 UTC A timeout occurred

geneh commented 4 years ago

A couple of observations: The blob size is 18 MB (not sure if it's a problem). summarizeService.summarize is not a function error in logs:

TypeError:
   at get (at get (/opt/service/routes/harvest.js:26:45)at get (/opt/service/routes/harvest.js:26:45): /opt/service/routes/harvest.jsat get (/opt/service/routes/harvest.js:26:45): 26)
   at processTicksAndRejections (at processTicksAndRejections (internal/process/task_queues.js:93:5)at processTicksAndRejections (internal/process/task_queues.js:93:5): internal/process/task_queues.jsat processTicksAndRejections (internal/process/task_queues.js:93:5): 93)
   at <no_method> (at async /opt/service/middleware/asyncMiddleware.js:6:5at async /opt/service/middleware/asyncMiddleware.js:6:5: async /opt/service/middleware/asyncMiddleware.jsat async /opt/service/middleware/asyncMiddleware.js:6:5: 6)

message: SvcRequestFailure: /harvest/git/github/alpinelinux/aports/b8229586a20eea5fc0f782a383ef666b6a60e054/scancode/3.2.2

pombredanne commented 4 years ago

@geneh that's also the case for any harvest even small ones such as https://api.clearlydefined.io/harvest/maven/mavencentral/io.dropwizard/dropwizard-jersey/2.0.0-rc13/clearlydefined/1.5.0

geneh commented 4 years ago

Yes, the size is only 102 KB, so the blob size isn't a problem. Also getting: message | SvcRequestFailure: /harvest/maven/mavencentral/io.dropwizard/dropwizard-jersey/2.0.0-rc13/clearlydefined/1.5.0 summarizeService.summarize is not a function

TypeError:
   at get (at get (/opt/service/routes/harvest.js:26:45)at get (/opt/service/routes/harvest.js:26:45): /opt/service/routes/harvest.jsat get (/opt/service/routes/harvest.js:26:45): 26)
   at processTicksAndRejections (at processTicksAndRejections (internal/process/task_queues.js:93:5)at processTicksAndRejections (internal/process/task_queues.js:93:5): internal/process/task_queues.jsat processTicksAndRejections (internal/process/task_queues.js:93:5): 93)
   at <no_method> (at async /opt/service/middleware/asyncMiddleware.js:6:5at async /opt/service/middleware/asyncMiddleware.js:6:5: async /opt/service/middleware/asyncMiddleware.jsat async /opt/service/middleware/asyncMiddleware.js:6:5: 6)
geneh commented 4 years ago

@tmarble Could you please take a look at this one? It may be easier to point your env to dev env definitions blob storage DEFINITION_AZBLOB_CONNECTION_STRING and set DEFINITION_STORE_PROVIDER to azure.

pombredanne commented 4 years ago

I have also noted that this works https://api.clearlydefined.io/harvest/git/github/alpinelinux/aports/b8229586a20eea5fc0f782a383ef666b6a60e054/scancode/3.2.2?form=raw (the form=raw is what is always used on the website side)

tmarble commented 4 years ago

@geneh I have found some things while debugging this...

  1. The summarizer is not properly intialized at https://github.com/clearlydefined/service/blob/master/routes/harvest.js#L107 because summarizeService is an object, but **summarizeService.summarizer" is not a function???
  2. If I override this and set summarizeService = require('../providers/summary/scancode')() then summarizeService.summarizer is a function, but then the error changes:
tmarble@avenir 259 :) curl -X GET "http://localhost:4000/harvest/git/github/zeit/next.js/ecf61f65a86c811efe5f2fded37889a9f2b5de96/scancode/3.2.2?form=summary"  -H  "accept: /"
{"error":{"code":"500","message":"An error has occurred",
          "innererror":{"name":"Error","message":"Not valid ScanCode data",
                        "stack":"Error: Not valid ScanCode data\n    at ScanCodeSummarizer.summarize (/home/tmarble/src/github/clearlydefined/service/providers/summary/scancode.js:32:33)\n    at get (/home/tmarble/src/github/clearlydefined/service/routes/harvest.js:27:45)\n    at async /home/tmarble/src/github/clearlydefined/service/middleware/asyncMiddleware.js:6:5"}}}
tmarble@avenir 260 :)

It is not evident the call path for routes/harvest.js setup, however it is likely based on the configuration. I'm wondering if the object being initialized is not, in fact, a ScanCodeSummarizer but a different object that does not have a summarize() function?

geneh commented 4 years ago

@tmarble Any idea why it works fine with ?form=raw parameter?

pombredanne commented 4 years ago

See also the possibly related #749

pombredanne commented 4 years ago

@geneh https://api.clearlydefined.io/harvest/git/github/zeit/next.js/ecf61f65a86c811efe5f2fded37889a9f2b5de96/scancode/3.2.2?form=raw works alright FYI

nellshamrell commented 3 years ago

Confirmed in prod for aports:

Works

$ curl https://api.clearlydefined.io/harvest/git/github/alpinelinux/aports/b8229586a20eea5fc0f782a383ef666b6a60e054/scancode/3.2.2?form=raw

Returns error

$ curl https://api.clearlydefined.io/harvest/git/github/alpinelinux/aports/b8229586a20eea5fc0f782a383ef666b6a60e054/scancode/3.2.2
{"error":{"code":"500","message":"An error has occurred","innererror":{}}}
qtomlinson commented 1 year ago

In the current production server, the reported cases seem to have been fixed:

https://api.clearlydefined.io/harvest/git/github/alpinelinux/aports/b8229586a20eea5fc0f782a383ef666b6a60e054/scancode/3.2.2 https://api.clearlydefined.io/harvest/maven/mavencentral/io.dropwizard/dropwizard-jersey/2.0.0-rc13/clearlydefined/1.5.0 https://api.clearlydefined.io/definitions/git/github/alpinelinux/aports/b8229586a20eea5fc0f782a383ef666b6a60e054