SeldonIO / seldon-deploy-operator

Seldon Deploy installation
4 stars 4 forks source link

release 1.2.0 #42

Closed ryandawsonuk closed 3 years ago

ryandawsonuk commented 3 years ago

fixes https://github.com/SeldonIO/seldon-deploy-operator/issues/43 also bumping resources - fixes https://github.com/SeldonIO/seldon-deploy-operator/issues/41

ryandawsonuk commented 3 years ago

1.2.0 was actually released using a PR version of alibi-detect image but seems safe to use 1.7.0

ryandawsonuk commented 3 years ago

Now blocked on https://access.redhat.com/support/cases/#/case/02949240 - unable to get batch proc image to pass scans...

But also deploy and deploy-operator scans are failing in the same way. Looking likely that all ubi8 images will have to be updated.

ryandawsonuk commented 3 years ago

The fact that deploy-operator is failing means we have to update operator-sdk as it uses quay.io/operator-framework/helm-operator:v1.3.0 base image. Could just update base image but then can't really be sure it will work. Prob need latest base image and also latest sdk

ryandawsonuk commented 3 years ago

Have now updated images and operator-sdk. Getting close now.

Remaining to do: 1) Updating the docs. 2) Getting everyone access to the docs. 3) Problem with outlier detector (see below).

Request logs stuff works... but outlier score on cifar10 records not being logged. It is getting calculated but logger can't handle it (not seen this one before):

upserted to doc inference-log-seldon-seldon-cifar10-default/inferencerequest/8e08a64f-7f88-4edf-b94a-90c372881b08 adding response
UNKNOWN REQUEST TYPE FOR 8e08a64f-7f88-4edf-b94a-90c372881b08 - NOT PROCESSING
unexpected data format
{'data': {'is_drift': 1, 'distance': [0.5329999923706055, 0.6503999829292297, 0.9413999915122986, 0.9958000183105469, 0.7712000012397766, 0.9197999835014343, 0.6651999950408936, 0.7480000257492065, 0.8999999761581421, 0.9039999842643738, 0.7972000241279602, 0.9757999777793884, 0.9991999864578247, 0.991599977016449, 0.90420001745224, 0.7390000224113464, 0.8456000089645386, 0.6940000057220459, 0.9926000237464905, 0.9125999808311462, 0.9074000120162964, 0.823199987411499, 0.9408000111579895, 0.5228000283241272, 0.8722000122070312, 0.9598000049591064, 0.6381999850273132, 0.6262000203132629, 0.9449999928474426, 0.9089999794960022, 0.5730000138282776, 0.7271999716758728], 'p_val': [0.4361779987812042, 0.24444031715393066, 0.00686792004853487, 3.5280001611681655e-05, 0.10469888150691986, 0.012864080257713795, 0.22418208420276642, 0.12700800597667694, 0.019999999552965164, 0.018432000651955605, 0.08225567638874054, 0.0011712800478562713, 1.2799999922208372e-06, 0.00014112000644672662, 0.01835528016090393, 0.13624200224876404, 0.04767872020602226, 0.18727199733257294, 0.00010951999865937978, 0.015277519822120667, 0.01714951917529106, 0.0625164806842804, 0.007009279914200306, 0.4554396867752075, 0.0326656810939312, 0.0032320800237357616, 0.2617984712123871, 0.2794528901576996, 0.006049999967217445, 0.016561999917030334, 0.364657998085022, 0.14883968234062195], 'threshold': 0.0015625}, 'meta': {'name': 'KSDrift', 'detector_type': 'offline', 'data_type': None}, 'ce-source': 'io.seldon.serving.seldon-seldondeployment-cifar10-drift'}
ryandawsonuk commented 3 years ago

Above would suggest that something was going wrong with headers sent through from detector. So I've built a request logger that prints the headers and did a whole uninstall and reinstall. This time the outlier detector itself fails:

[I 210526 14:16:38 od_model:85] PROCESSING EVENT.
[I 210526 14:16:38 od_model:86] {'Host': 'seldon-seldondeployment-cifar10-outlier.seldon.svc.cluster.local', 'User-Agent': 'Go-http-client/1.1', 'Content-Length': '57889', 'Accept-Encoding': 'gzip', 'Ce-Endpoint': 'default', 'Ce-Id': '124afc94-2da5-4902-86f8-c963771f7618', 'Ce-Inferenceservicename': 'cifar10', 'Ce-Knativearrivaltime': '2021-05-26T14:16:27.248635583Z', 'Ce-Modelid': 'cifar10-container', 'Ce-Namespace': 'seldon', 'Ce-Requestid': '9f57ebfe-2680-4f02-9d09-d04da6a0ea56', 'Ce-Source': 'http://:8000/', 'Ce-Specversion': '1.0', 'Ce-Time': '2021-05-26T14:16:27.238532688Z', 'Ce-Traceparent': '00-6a63fc2c5023c46cf77bf475e2bf1b2b-829b41d0ce2c33c4-00', 'Ce-Type': 'io.seldon.serving.inference.request', 'Content-Type': 'application/json', 'Forwarded': 'for=10.130.3.231;proto=http', 'K-Proxy-Request': 'activator', 'Traceparent': '00-6a63fc2c5023c46cf77bf475e2bf1b2b-76538c8fa493d0d3-00', 'X-Forwarded-For': '10.130.3.231, 10.129.3.0', 'X-Forwarded-Proto': 'http', 'X-Request-Id': '086da02b-e186-447d-b9b9-3bac4eebfb18'}
[I 210526 14:16:38 od_model:87] ----
[E 210526 14:16:38 web:1793] Uncaught exception POST / (127.0.0.1)
    HTTPServerRequest(protocol='http', host='seldon-seldondeployment-cifar10-outlier.seldon.svc.cluster.local', method='POST', uri='/', version='HTTP/1.1', remote_ip='127.0.0.1')
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.7/site-packages/tornado/web.py", line 1702, in _execute
        result = method(*self.path_args, **self.path_kwargs)
      File "/microservice/adserver/server.py", line 245, in post
        response = self.model.process_event(request, headers)
      File "/microservice/adserver/od_model.py", line 132, in process_event
        return_instance_score=ret_instance_score,
    TypeError: predict() got an unexpected keyword argument 'outlier_type'
[E 210526 14:16:38 web:2243] 500 POST / (127.0.0.1) 3822.03ms

Not sure why. This component wan't erroring in the previous install.

Also strange that it does spin up the pod for the kservice. So the trigger is getting the event through.

ryandawsonuk commented 3 years ago

Seems the outlier detector looks for a header called "Alibi-Detect-Outlier-Type". My best ideas at the moment are 1) Maybe there's a header going missing. 2) Maybe the problem is actually the 1.7.0 version of the alibi-detect server. That wasn't strictly the version used in the deploy 1.2.0 release.

ryandawsonuk commented 3 years ago

Plugged in the 1.8.0 alibi-detect image and everything worked. Hmm.