AllenInstitute / datacube

Other
0 stars 1 forks source link

incorrect mouse_ccf (possible other service as well) call can crash datacube server #68

Closed shus2018 closed 6 years ago

shus2018 commented 6 years ago

seeing some incorrect call to tdatacube and server crashed and restarted. Here is the possible repro from the log:

{'field': 'color', 'select': {'anterior_posterior': None, 'superior_inferior': None, 'left_right': 260}, 'dim_order': None, 'image_format': 'png', 'name': 'mouse_ccf'}

NileGraddis commented 6 years ago

what procedure? Are there args?

NileGraddis commented 6 years ago

on axon this just tells me that the request is invalid.

chrisbarber commented 6 years ago

same for me on devdatacube. just talked to Shu and we think this request might have been a coincidence. still don't know why this is happening though:

2018-06-06T14:13:59-0700 [Guest       11985] /local1/apps/datacube-builds/DataCube--206/services/pandas/run.sh: line 6:  3255 Killed                  python server.py $@

2018-06-06T14:13:59-0700 [Guest       11985] Service crashed with exit code 137.  Respawning...

edit: here's the full request for posterity:

curl -H "Content-Type:application/json" -d '{"procedure": "org.brain-map.api.datacube.image.mouse_ccf", "args": [], "kwargs": {"field": "color", "select": {"anterior_posterior": null, "superior_inferior": null, "left_right": 260}, "dim_order": null, "image_format": "png"}}' http://devdatacube:8080/call
shus2018 commented 6 years ago

Thanks, Chris. Good news is that server auto-restarted and everything working as expected again. :-)

possible storage related like yesterday's legacy route. This is the lowest priority issue for now.

chrisbarber commented 6 years ago

@shus2018 , I have a theory about what might be causing the datacube to get killed. In #55 we were seeing MemoryErrors getting caught at the python level but it's possible that it could also manifest with getting OOM killed by the kernel. This was fixed in #65 though so if you deploy the latest release build to tdatacube, it's possible we won't see this happen anymore.

shus2018 commented 6 years ago

lasted build deployed, I cleaned existing virtual env and .local manually from a few previous deployments as well, looks like tdatacube performed much better, so far no any crash or time out. Only one regression, issue#69 tracked. This issue can be closed for now. Will track new issues if happened again.