Open dabutvin opened 5 years ago
Love the intent but I'd be a little careful here as we'll end up with scenarios where we bulk recompute etc and that would trigger massive queuing for harvesting. That will cause the crawlers to ping repository APIs repeatedly to get the latest of a given package etc.
that's a good call out for sure - maybe this should happen on the crawler side only - and leave definition recompute out of it?
What would trigger the queuing of the latest in that case?
I was thinking, when we crawl an older version, then check and queue the latest version
ah, yeah, could do that. Would be a little awkward where to put that. The Crawler really only does fetch
and process
. Could put it in the ClearlyDefined processor but process generally does not reach out anywhere to get stuff. fetch could attempt to get the latest for the given coordinates but then it would do that for every tool request (not just the ClearlyDefined tool unless we coded that in).
In the end perhaps we should wait to see if this is an issue (lacking new versions of things). We could rely on our monitoring of the ecosystems to detect new versions. That will happen almost for free. If that is failing somehow then perhaps add this complexity?
In the definitionService after compute and store, we should tell the harvestService to queue the latest version of the package.
For any user that is 'using' ClearlyDefined for their current package lists, chances are they will eventually upgrade. A lot of times, this harvest will be a no-op when we already have the latest versions, but this will keep us ahead of the package upgrades before they come.
fyi @iamwillbar