Open iamwillbar opened 5 years ago
We will check curations for sourceLocations when a user's contribution has been "sync'd" (after pressing the "Contribute" button at least once).
Presumably the website client will call this endpoint upon clicking 'Sync': https://github.com/clearlydefined/service/blob/master/routes/curations.js#L92
Which calls the function syncAllContributions
in
https://github.com/clearlydefined/service/blob/master/providers/curation/github.js#L45
.. in turn calls _processContributions(prs)
in github.js at
https://github.com/clearlydefined/service/blob/master/providers/curation/github.js#L62
This is the call site where we can inspect each storedContribution
to see if it specifies a sourceLocation, and, if so, add it to the harvest queue.
NOTE: It would appear that the current test coverage doesn't yet have an example of actually calling syncAllContributions
as it is simply mocked in
https://github.com/clearlydefined/service/blob/master/test/providers/curation/processTest.js#L22
In particular it is not evident what the expected structure of storedContribution
can be and it cannot be deduced from test examples.
Assuming we did have a sourceLocation example -- like the one corresponding to: https://github.com/clearlydefined/service/blob/master/test/fixtures/curation-valid.1.yaml#L24
Then it is not evident the proper call path to enqueue the sourceLocation. Presumably, based on configuration, this would ultimately call an Azure queuing function: https://github.com/clearlydefined/service/blob/master/providers/queueing/azureStorageQueue.js#L27 or a memory queuing function: https://github.com/clearlydefined/service/blob/master/providers/queueing/memoryQueue.js#L21
It is possible that the common call location for enqueuing is (via the harvest function): https://github.com/clearlydefined/service/blob/master/providers/harvest/crawlerQueue.js#L13
The crawler is constructed here: https://github.com/clearlydefined/service/blob/master/providers/harvest/crawlerQueueConfig.js#L24
..from here: https://github.com/clearlydefined/service/blob/master/providers/harvest/crawlerConfig.js#L7
..from here: https://github.com/clearlydefined/service/blob/master/providers/index.js#L46
..from here: https://github.com/clearlydefined/service/blob/master/bin/config.js#L50
Thus it is not evident which is the proper namespace to require at the top of github.js to invoke the harvest function to enqueue the new sourceLocation.
I'm not sure if we do this today, but if we don't we should :). When someone submits a curation to specify a source location we should automatically queue that location for harvesting.