Hydrospheredata / hydro-serving

MLOps Platform
http://docs.hydrosphere.io
Apache License 2.0
271 stars 42 forks source link

Server error #292

Closed GurovNik closed 3 years ago

GurovNik commented 5 years ago

Issues

Infrastructure issue

  1. Cluster buildinfo
    [
    --------------------------------- url
    /api/buildinfo
    /gateway/buildinfo
    /monitoring/buildinfo
    ---------------------------------
    name
    ---------------------------------
    serving-manager
    serving-gateway
    sonar
    ---------------------------------
    gitHeadCommit
    ------------------------------------------
    fe83c6ae3a0fe87883aadd9d1250766c4766645c
    8634cd7dc2d555625ec570478c29dc06ebe4d419
    f539687ab75a475b0c90d2fdc1e757616725be6d
    ---------------------------------
    gitCurrentTags
    ---------------------------------
    []
    []
    []
    ---------------------------------
    gitCurrentBranch
    ---------------------------------
    master
    master
    master
    ---------------------------------
    scalaVersion
    ----------------
    2.12.8
    2.12.8
    2.12.7
    ---------------------------------
    version
    ------------------------------------------
    fe83c6ae3a0fe87883aadd9d1250766c4766645c
    8634cd7dc2d555625ec570478c29dc06ebe4d419
    f539687ab75a475b0c90d2fdc1e757616725be6d
    ---------------------------------
    sbtVersion
    ---------------------------------
    1.2.8
    1.2.8
    1.2.7 ]
  2. Logs of faulty service.

$ hs -v upload debug: Current cluster: {'cluster': {'server': 'https://dev.k8s.hydrosphere.io'}, 'name': 'Nikita_2708'} debug: Payload src/ is resolved as /Users/nikitagurov/hydrospere/models/activity_recognition/model/src debug: Payload requirements.txt is resolved as /Users/nikitagurov/hydrospere/models/activity_recognition/model/requirements.txt debug: Payload model.h5 is resolved as /Users/nikitagurov/hydrospere/models/activity_recognition/model/model.h5 debug: Popen(['git', 'version'], cwd=/Users/nikitagurov/hydrospere/models/activity_recognition/model, universal_newlines=False, shell=None, istream=None) debug: Error while extracting .git metadata: Failed to initialize: Cmd('git') failed due to: exit code(1) debug: cmdline: git version debug: stderr: 'xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun' debug: Can't extract DVC metadata: No module named 'dvc' Model definition composed: {'contract': {'modelName': 'model', 'predict': {'inputs': [{'dtype': 'DT_DOUBLE', 'name': 'x', 'profile': 'NUMERICAL', 'shape': {'dim': [{'name': '', 'size': 1}, {'name': '', 'size': 300}, {'name': '', 'size': 3}, {'name': '', 'size': 3}], 'unknownRank': False}}], 'outputs': [{'dtype': 'DT_DOUBLE', 'name': 'y', 'profile': 'NUMERICAL', 'shape': {'dim': [{'name': '', 'size': 5}], 'unknownRank': False}}], 'signatureName': 'infer'}}, 'host_selector': None, 'install_command': 'pip install -r requirements.txt', 'metadata': {}, 'monitoring': None, 'name': 'activity', 'payload': ['/Users/nikitagurov/hydrospere/models/activity_recognition/model/src', '/Users/nikitagurov/hydrospere/models/activity_recognition/model/requirements.txt', '/Users/nikitagurov/hydrospere/models/activity_recognition/model/model.h5'], 'runtime': {'name': 'hydrosphere/serving-runtime-python-3.6', 'tag': 'dev'}, 'training_data_file': None} debug: Creating archive: /Users/nikitagurov/hydrospere/models/activity_recognition/model/.hs/activity/activity.tar.gz debug: Archiving /Users/nikitagurov/hydrospere/models/activity_recognition/model/src as src debug: Archiving /Users/nikitagurov/hydrospere/models/activity_recognition/model/requirements.txt as requirements.txt debug: Archiving /Users/nikitagurov/hydrospere/models/activity_recognition/model/model.h5 as model.h5 debug: Uploading model to https://dev.k8s.hydrosphere.io debug: MULTIPART POST: https://dev.k8s.hydrosphere.io/api/v2/model/upload. Parts: {'metadata': '{"contract": {"modelName": "model", "predict": {"signatureName": "infer", "inputs": [{"name": "x", "profile": "NUMERICAL", "shape": {"dim": [{"size": 1, "name": ""}, {"size": 300, "name": ""}, {"size": 3, "name": ""}, {"size": 3, "name": ""}], "unknownRank": false}, "dtype": "DT_DOUBLE"}], "outputs": [{"name": "y", "profile": "NUMERICAL", "shape": {"dim": [{"size": 5, "name": ""}], "unknownRank": false}, "dtype": "DT_DOUBLE"}]}}, "hostSelectorName": null, "runtime": {"tag": "dev", "name": "hydrosphere/serving-runtime-python-3.6"}, "name": "activity", "installCommand": "pip install -r requirements.txt", "metadata": {}}', 'payload': ('filename', <_io.BufferedReader name='/Users/nikitagurov/hydrospere/models/activity_recognition/model/.hs/activity/activity.tar.gz'>)} debug: Starting new HTTPS connection (1): dev.k8s.hydrosphere.io:443 debug: https://dev.k8s.hydrosphere.io:443 "POST /api/v2/model/upload HTTP/1.1" 200 None Waiting for a model build to complete... debug: Starting new HTTPS connection (1): dev.k8s.hydrosphere.io:443 debug: https://dev.k8s.hydrosphere.io:443 "GET /api/v2/model/version/activity/26 HTTP/1.1" 200 None debug: Starting new HTTPS connection (1): dev.k8s.hydrosphere.io:443 error: Server returned an error ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response’))
TIME: 28.08.19, 11:55 AM

Sometimes this error is followed by error: $ hs -v upload debug: Current cluster: {'cluster': {'server': 'https://dev.k8s.hydrosphere.io'}, 'name': 'Nikita_2708'} debug: Payload src/ is resolved as /Users/nikitagurov/hydrospere/models/activity_recognition/model/src debug: Payload requirements.txt is resolved as /Users/nikitagurov/hydrospere/models/activity_recognition/model/requirements.txt debug: Payload model.h5 is resolved as /Users/nikitagurov/hydrospere/models/activity_recognition/model/model.h5 debug: Popen(['git', 'version'], cwd=/Users/nikitagurov/hydrospere/models/activity_recognition/model, universal_newlines=False, shell=None, istream=None) debug: Error while extracting .git metadata: Failed to initialize: Cmd('git') failed due to: exit code(1) debug: cmdline: git version debug: stderr: 'xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun' debug: Can't extract DVC metadata: No module named 'dvc' Model definition composed: {'contract': {'modelName': 'model', 'predict': {'inputs': [{'dtype': 'DT_DOUBLE', 'name': 'x', 'profile': 'NUMERICAL', 'shape': {'dim': [{'name': '', 'size': 1}, {'name': '', 'size': 300}, {'name': '', 'size': 3}, {'name': '', 'size': 3}], 'unknownRank': False}}], 'outputs': [{'dtype': 'DT_DOUBLE', 'name': 'y', 'profile': 'NUMERICAL', 'shape': {'dim': [{'name': '', 'size': 5}], 'unknownRank': False}}], 'signatureName': 'infer'}}, 'host_selector': None, 'install_command': 'pip install -r requirements.txt', 'metadata': {}, 'monitoring': None, 'name': 'activity', 'payload': ['/Users/nikitagurov/hydrospere/models/activity_recognition/model/src', '/Users/nikitagurov/hydrospere/models/activity_recognition/model/requirements.txt', '/Users/nikitagurov/hydrospere/models/activity_recognition/model/model.h5'], 'runtime': {'name': 'hydrosphere/serving-runtime-python-3.6', 'tag': 'dev'}, 'training_data_file': None} debug: Creating archive: /Users/nikitagurov/hydrospere/models/activity_recognition/model/.hs/activity/activity.tar.gz debug: Archiving /Users/nikitagurov/hydrospere/models/activity_recognition/model/src as src debug: Archiving /Users/nikitagurov/hydrospere/models/activity_recognition/model/requirements.txt as requirements.txt debug: Archiving /Users/nikitagurov/hydrospere/models/activity_recognition/model/model.h5 as model.h5 debug: Uploading model to https://dev.k8s.hydrosphere.io debug: MULTIPART POST: https://dev.k8s.hydrosphere.io/api/v2/model/upload. Parts: {'metadata': '{"contract": {"modelName": "model", "predict": {"signatureName": "infer", "inputs": [{"name": "x", "profile": "NUMERICAL", "shape": {"dim": [{"size": 1, "name": ""}, {"size": 300, "name": ""}, {"size": 3, "name": ""}, {"size": 3, "name": ""}], "unknownRank": false}, "dtype": "DT_DOUBLE"}], "outputs": [{"name": "y", "profile": "NUMERICAL", "shape": {"dim": [{"size": 5, "name": ""}], "unknownRank": false}, "dtype": "DT_DOUBLE"}]}}, "hostSelectorName": null, "runtime": {"tag": "dev", "name": "hydrosphere/serving-runtime-python-3.6"}, "name": "activity", "installCommand": "pip install -r requirements.txt", "metadata": {}}', 'payload': ('filename', <_io.BufferedReader name='/Users/nikitagurov/hydrospere/models/activity_recognition/model/.hs/activity/activity.tar.gz'>)} debug: Starting new HTTPS connection (1): dev.k8s.hydrosphere.io:443 debug: https://dev.k8s.hydrosphere.io:443 "POST /api/v2/model/upload HTTP/1.1" 200 None Waiting for a model build to complete... debug: Starting new HTTPS connection (1): dev.k8s.hydrosphere.io:443 error: Server returned an error HTTPSConnectionPool(host='dev.k8s.hydrosphere.io', port=443): Max retries exceeded with url: /api/v2/model/version/activity/14 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x10896c390>: Failed to establish a new connection: [Errno 60] Operation timed out'))

Feature

Uploading model with '$hs upload' fails with above error periodically. Often it fails at the beginning of work with the model. For example last days my first hs upload (during day) was failed. Such behaviour was on different laptops. After crash model is not uploaded to the serving. Repeating '$hs upload' command solves the problem and model uploads successfully.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.