Hydrospheredata / hydro-serving

MLOps Platform
http://docs.hydrosphere.io
Apache License 2.0
271 stars 41 forks source link

Encountered model file not found error during endpoint access #284

Closed nsubram closed 3 years ago

nsubram commented 5 years ago

Runtime error

  1. Model container runtime and perhaps manager runtime
  2. Attached logs

Description

We are experimenting with Hydrosphere serving and following the instructions mentioned in the Quickstart tutorial of the documentation. But encountering model not found error when trying to access the endpoint/model. Model build and the application creation worked fine as seen in screenshot. However the model.h5 is not found in the model container file system or the manager container file system. The project structure and file contents is exactly same as mentioned in the tutorial. Request you to kindly check.

application-create-success application-creation hydrosphere-build-information manager-log managerui-log model-build-success model-container-startup-log model-h5-file-not-seen-in-manager-container model-h5-file-not-seen-in-model-container query-endpoint-failure


KineticCookie commented 5 years ago

Hi. "Payload" output of CLI shows only src folder and requirements.txt. Could you please show the model folder structure and payload section in serving.yaml file?

KineticCookie commented 5 years ago

For instance in this quickstart we define payload with our model file:

payload:
  - "src/"
  - "requirements.txt"
  - "linear_regression/model.h5"

I'm not sure which quickstart tutorial you use, so could you please send me the link. This way I could revise it and check for possible errors. Thanks.

nsubram commented 5 years ago

Hi - please find attachments for the project/folder structure and also the serving.yaml file contents. Also am referring the following link to try out: https://hydrosphere.io/serving-docs/latest/tutorials/index.html

Screenshot from 2019-07-18 15-30-24 Screenshot from 2019-07-18 15-30-40

KineticCookie commented 5 years ago

Yeah, I see it now, it's a typo in the docs. You have linear_regression/model.h5 in your payload, but the actual file path is just model.h5. I will fix it ASAP.

However, it's very strange that CLI didn't throw an error handling this missing file. Could you also send here the result of hs --version command. Thanks.

nsubram commented 5 years ago

Thanks for your update. I will try changing serving.yaml by removing the prefix "linear_regression" from the payload property. Also will provide the CLI output after re-trying with faulty path.

nsubram commented 5 years ago

Please find the faulty run screenshot: The hs tool seems to ignore the file mentioned in the payload but not found and goes ahead silently. Thanks once again for your help to spot the issue.

Screenshot from 2019-07-17 11-21-47

nsubram commented 5 years ago

Encountering the following issue now after correcting the payload path of the model file in the serving.yaml file. Request you to kindly check.

Screenshot from 2019-07-18 19-23-48 Screenshot from 2019-07-18 19-37-56 Screenshot from 2019-07-18 19-39-42 Screenshot from 2019-07-18 19-43-26

nsubram commented 5 years ago

Hi 1) Did some trial and error on this issue. Looks like the problem is with the keras version and when I changed it as below, it works Used Keras version 2.2.4 instead of the recommended 2.2.0 Used Tensorflow version 1.11.0 instead of the recommended 1.8.0

2) For the increment_app example in the documentation, there was a typo "requets_number" response_number = request_number + 1

3) Further for the increment_app example in the documentation, there was another change needed to make it work (attached issue screenshot) as follows: From: response_tensor_shape = [hs.TensorShapeProto.Dim(size=dim) for dim in number.tensor_shape.dim] To: response_tensor_shape = [hs.TensorShapeProto.Dim(size=-1) for dim in number.tensor_shape.dim]

Screenshot from 2019-07-19 08-56-55 Screenshot from 2019-07-19 09-09-18 Screenshot from 2019-07-19 10-38-06 Screenshot from 2019-07-19 10-59-34

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.