Verify periodic checks in registry branch

ingenieroariel commented 8 years ago

We want to make sure that counts are being incremented, last check date is updated and the services are green / red when needed.

panchicore commented 8 years ago

There is a Hard time limit for hypermap.aggregator.tasks.check_layer (10s) added by @capooti , I dont know what is the analysis behind that number but would be OK to understand it since slow services will be celery-crashing while checking services/layers.

ingenieroariel commented 8 years ago

But that is for a single layer, no? that should be way faster than 10 seconds I believe (more like 0.5- 5 seconds)

ingenieroariel commented 8 years ago

What's your suggested value @panchicore ?

panchicore commented 8 years ago

I will add a soft time limit, not a hard one, so we will be saving catching it and create a layer failed check instead of raising an exception, this will increment the Reliability warning to the user that says that this service is taking more than 10 seconds to check. If we dont do it, the check counter will stop to increment and also the users trust.

panchicore commented 8 years ago

https://github.com/cga-harvard/HHypermap/issues/156

panchicore commented 8 years ago

FYI endpoint_lists/fire_data.txt does not create Services.

ERROR! Could not detect service type for endpoint http://activefiremaps.fs.fed.us/cgi-bin/mapserv.exe?map=conus_fire_2001-2009.map& SERVICE=WMS&VERSION=1.1.1&REQUEST=GetCapabilities&version=1.0.0 or already existing. messages=(Document is XML, but not CSW-ish;'NoneType' object has no attribute 'attrib';Expected TileMapService tag, got WMT_MS_Capabilities;'NoneType' object has no attribute 'find')

panchicore commented 8 years ago

FYI some endpoints passes successfully with zero services created. http://d.pr/i/1dAzm

panchicore commented 8 years ago

Consider setting back the celery chains, some endpoints like https://ngamaps.geointapps.org/arcgis/rest/services/Ivory_Coast/Ivory_Coast_Education_Areas/MapServer does not like concurrent requests or become slow.

ingenieroariel commented 8 years ago

Good idea - but those were creating infinite loops or just become too slow when services had thousands of layers. How can we limit the chain to 50 tasks at most or another small number?

On Sat, Aug 13, 2016 at 9:58 PM, Luis Pallares notifications@github.com wrote:

Consider setting back the celery chains, some endpoints like https://ngamaps.geointapps.org/arcgis/rest/services/ Ivory_Coast/Ivory_Coast_Education_Areas/MapServer does not like concurrent requests or become slow.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cga-harvard/HHypermap/issues/154#issuecomment-239653108, or mute the thread https://github.com/notifications/unsubscribe-auth/AADW11pwFuf8DcG9sw_sFGMrWikKkaRmks5qfoRkgaJpZM4JjtJz .

panchicore commented 8 years ago

Well I dont really know why are they failing to download:

Updating layers for service id 141
This ESRI REST endpoint has an WMS interface to process: https://ngamaps.geointapps.org/arcgis/services/Ivory_Coast/Ivory_Coast_Medical_Health_Areas/MapServer/WMSServer?

ConnectionError: ('Connection aborted.', BadStatusLine("''",))

what do you think?

anyways we have more visibility now

http://d.pr/i/No2L

panchicore commented 8 years ago

re: chains: chains are simply a list of tasks executed one after the other (think in callbacks) so what you want to do is check the list size in order to know how many tasks group together to send to the worker. in this case, if we do 2 chains of 50 then 2 concurrent workers will be hitting a server.

ingenieroariel commented 8 years ago

It is fine to have a small number of workers hitting the server at the same time (max 8) and we should have a max number of items per chain (100?). How can we make it work like that?

On Sat, Aug 13, 2016 at 10:27 PM, Luis Pallares notifications@github.com wrote:

re: chains: chains are simply a list of tasks executed one after the other (think in callbacks) so what you want to do is check the list size in order to know how many tasks group together to send to the worker. in this case, if we do 2 chains of 50 then 2 concurrent workers will be hitting a server.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cga-harvard/HHypermap/issues/154#issuecomment-239653891, or mute the thread https://github.com/notifications/unsubscribe-auth/AADW14Tj3CMA6crFiQwsLJhS9w7zuWXMks5qfosSgaJpZM4JjtJz .

capooti commented 8 years ago

@panchicore regarding the hard time timeout, this was added as many layers took a very long time to answer (even minutes in some cases), often resulting with an error, making the check operations very slow.

capooti commented 8 years ago

@ingenieroariel yes, we need to figure out an elegant way to have a max number of items per chain. My experience is that chains with a large number of tasks fail in Celery. How about to start using batch tasks? Is that possible in ES? We could have chains with a number of batch tasks.

ingenieroariel commented 8 years ago

Yes, batch is possible in ES too. And I agree on the batching approach. @panchicore can help us implement as he has experience with that part of the code.

On Mon, Aug 15, 2016 at 10:57 AM, Paolo Corti notifications@github.com wrote:

@ingenieroariel https://github.com/ingenieroariel yes, we need to figure out an elegant way to have a max number of items per chain. My experience is that chains with a large number of tasks fail in Celery. How about to start using batch tasks? Is that possible in ES? We could have chains with a number of batch tasks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cga-harvard/HHypermap/issues/154#issuecomment-239843352, or mute the thread https://github.com/notifications/unsubscribe-auth/AADW1_hAAdN9EG6eDlmhoH9Gd3FuI09Wks5qgIxwgaJpZM4JjtJz .

capooti commented 8 years ago

@ingenieroariel @panchicore can we discuss about this today on the slack channel? I need to find out an immediate solution to this, as we need to have our instance to safely harvest and index everything without problems.

panchicore commented 8 years ago

@capooti @ingenieroariel moving check_service ---> check_layer to chunked chains results in improvement of performance as well.

app ran with 4 workers.

see bellow:

all async:

took 5.8 mins

chunked chains:

took 4.2 mins (since worldmap.harvard.edu responded faster I think)

there were no big differences in RAM and CPU load.

panchicore commented 8 years ago

next step is to remove indexing to search from that pipeline, delegate this task to a periodic task where check for the recent updated Layers and send them in bulk to search engine. a big reduction time is expected as well.

capooti commented 8 years ago

Thanks a lot @panchicore

cga-harvard / Hypermap-Registry

Verify periodic checks in registry branch #154

all async:

chunked chains: