Currently the link checker has to make two requests to CKAN for each link that it checks:
Link checker posts to get_resource_ids_to_check and gets a list of resources to check. Iterates over them and for each:
Link checker posts to resource_show to get the url of the resource
Checks the url
Posts to save_link_checker_result.
I think this is fine and appropriate for the 90% use-case: the link checker is running as a periodic background task, slowly checking links.
But there are some use-cases where you might want to support checking a lot of links as fast as possible:
You have a website with a lot of datasets, you just installed the link checker and want to check them all for the first time
User clicks a "re-check all of this resource / dataset / organization / group's links now" now button, you want to give the user the results asap
A faster protocol would be:
CKAN returns the URLs along with the resource IDs in the first place, so the link checker doesn't need to call resource_show for each. Calling resource_show right before checking each link means it gets the resource's current URL (may have changed) so is more accurate, but slower.
The link checker posts all the results back to CKAN at once, instead of one post per result.
Currently the link checker has to make two requests to CKAN for each link that it checks:
get_resource_ids_to_check
and gets a list of resources to check. Iterates over them and for each:resource_show
to get the url of the resourcesave_link_checker_result
.I think this is fine and appropriate for the 90% use-case: the link checker is running as a periodic background task, slowly checking links.
But there are some use-cases where you might want to support checking a lot of links as fast as possible:
A faster protocol would be:
resource_show
for each. Calling resource_show right before checking each link means it gets the resource's current URL (may have changed) so is more accurate, but slower.