Closed ryandawsonuk closed 3 years ago
1.2.0 was actually released using a PR version of alibi-detect image but seems safe to use 1.7.0
Now blocked on https://access.redhat.com/support/cases/#/case/02949240 - unable to get batch proc image to pass scans...
But also deploy and deploy-operator scans are failing in the same way. Looking likely that all ubi8 images will have to be updated.
The fact that deploy-operator is failing means we have to update operator-sdk as it uses quay.io/operator-framework/helm-operator:v1.3.0 base image. Could just update base image but then can't really be sure it will work. Prob need latest base image and also latest sdk
Have now updated images and operator-sdk. Getting close now.
Remaining to do: 1) Updating the docs. 2) Getting everyone access to the docs. 3) Problem with outlier detector (see below).
Request logs stuff works... but outlier score on cifar10 records not being logged. It is getting calculated but logger can't handle it (not seen this one before):
upserted to doc inference-log-seldon-seldon-cifar10-default/inferencerequest/8e08a64f-7f88-4edf-b94a-90c372881b08 adding response
UNKNOWN REQUEST TYPE FOR 8e08a64f-7f88-4edf-b94a-90c372881b08 - NOT PROCESSING
unexpected data format
{'data': {'is_drift': 1, 'distance': [0.5329999923706055, 0.6503999829292297, 0.9413999915122986, 0.9958000183105469, 0.7712000012397766, 0.9197999835014343, 0.6651999950408936, 0.7480000257492065, 0.8999999761581421, 0.9039999842643738, 0.7972000241279602, 0.9757999777793884, 0.9991999864578247, 0.991599977016449, 0.90420001745224, 0.7390000224113464, 0.8456000089645386, 0.6940000057220459, 0.9926000237464905, 0.9125999808311462, 0.9074000120162964, 0.823199987411499, 0.9408000111579895, 0.5228000283241272, 0.8722000122070312, 0.9598000049591064, 0.6381999850273132, 0.6262000203132629, 0.9449999928474426, 0.9089999794960022, 0.5730000138282776, 0.7271999716758728], 'p_val': [0.4361779987812042, 0.24444031715393066, 0.00686792004853487, 3.5280001611681655e-05, 0.10469888150691986, 0.012864080257713795, 0.22418208420276642, 0.12700800597667694, 0.019999999552965164, 0.018432000651955605, 0.08225567638874054, 0.0011712800478562713, 1.2799999922208372e-06, 0.00014112000644672662, 0.01835528016090393, 0.13624200224876404, 0.04767872020602226, 0.18727199733257294, 0.00010951999865937978, 0.015277519822120667, 0.01714951917529106, 0.0625164806842804, 0.007009279914200306, 0.4554396867752075, 0.0326656810939312, 0.0032320800237357616, 0.2617984712123871, 0.2794528901576996, 0.006049999967217445, 0.016561999917030334, 0.364657998085022, 0.14883968234062195], 'threshold': 0.0015625}, 'meta': {'name': 'KSDrift', 'detector_type': 'offline', 'data_type': None}, 'ce-source': 'io.seldon.serving.seldon-seldondeployment-cifar10-drift'}
Above would suggest that something was going wrong with headers sent through from detector. So I've built a request logger that prints the headers and did a whole uninstall and reinstall. This time the outlier detector itself fails:
[I 210526 14:16:38 od_model:85] PROCESSING EVENT.
[I 210526 14:16:38 od_model:86] {'Host': 'seldon-seldondeployment-cifar10-outlier.seldon.svc.cluster.local', 'User-Agent': 'Go-http-client/1.1', 'Content-Length': '57889', 'Accept-Encoding': 'gzip', 'Ce-Endpoint': 'default', 'Ce-Id': '124afc94-2da5-4902-86f8-c963771f7618', 'Ce-Inferenceservicename': 'cifar10', 'Ce-Knativearrivaltime': '2021-05-26T14:16:27.248635583Z', 'Ce-Modelid': 'cifar10-container', 'Ce-Namespace': 'seldon', 'Ce-Requestid': '9f57ebfe-2680-4f02-9d09-d04da6a0ea56', 'Ce-Source': 'http://:8000/', 'Ce-Specversion': '1.0', 'Ce-Time': '2021-05-26T14:16:27.238532688Z', 'Ce-Traceparent': '00-6a63fc2c5023c46cf77bf475e2bf1b2b-829b41d0ce2c33c4-00', 'Ce-Type': 'io.seldon.serving.inference.request', 'Content-Type': 'application/json', 'Forwarded': 'for=10.130.3.231;proto=http', 'K-Proxy-Request': 'activator', 'Traceparent': '00-6a63fc2c5023c46cf77bf475e2bf1b2b-76538c8fa493d0d3-00', 'X-Forwarded-For': '10.130.3.231, 10.129.3.0', 'X-Forwarded-Proto': 'http', 'X-Request-Id': '086da02b-e186-447d-b9b9-3bac4eebfb18'}
[I 210526 14:16:38 od_model:87] ----
[E 210526 14:16:38 web:1793] Uncaught exception POST / (127.0.0.1)
HTTPServerRequest(protocol='http', host='seldon-seldondeployment-cifar10-outlier.seldon.svc.cluster.local', method='POST', uri='/', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/tornado/web.py", line 1702, in _execute
result = method(*self.path_args, **self.path_kwargs)
File "/microservice/adserver/server.py", line 245, in post
response = self.model.process_event(request, headers)
File "/microservice/adserver/od_model.py", line 132, in process_event
return_instance_score=ret_instance_score,
TypeError: predict() got an unexpected keyword argument 'outlier_type'
[E 210526 14:16:38 web:2243] 500 POST / (127.0.0.1) 3822.03ms
Not sure why. This component wan't erroring in the previous install.
Also strange that it does spin up the pod for the kservice. So the trigger is getting the event through.
Seems the outlier detector looks for a header called "Alibi-Detect-Outlier-Type". My best ideas at the moment are 1) Maybe there's a header going missing. 2) Maybe the problem is actually the 1.7.0 version of the alibi-detect server. That wasn't strictly the version used in the deploy 1.2.0 release.
Plugged in the 1.8.0 alibi-detect image and everything worked. Hmm.
fixes https://github.com/SeldonIO/seldon-deploy-operator/issues/43 also bumping resources - fixes https://github.com/SeldonIO/seldon-deploy-operator/issues/41