Closed valentinedwv closed 5 years ago
David,
I tried it myself by harvesting ioos site into the local folder and an instance of the geoportal catalog. I've got sold 300 records/min for the folder and 200 records/min for catalog. Not bad.
At this moment, without solid evidence and perhaps spending some time on profiling this endpoint, I can only conclude that the problem is NOT on the harvester side.
If you leave "ignore robots.txt" unchecked, then you get a 10 second delay in CrawlLocker
yep, check out: http://search.geothermaldata.org/robots.txt https://data.ioos.us/robots.txt
it lists the crawl delay. We respect the robots.txt settings by default (geoportal is a good bot).
https://data.ioos.us/csw
6 records a minute. Returning 10 records at a time comes back in about 3 seconds (start 1, and 1001)
Saw this before with http://search.geothermaldata.org/csw
and attributed it to some server/pyCSW issue.
Now it's feeling like it might be something on the harvester side.
IOOS had an issue, and has now fixed it. (turned out to be a python 3 string issue)