canonical / charmed-kubeflow-workflows

Repository that containes GitHub workflows and shareable configs for Charmed Kubeflow
0 stars 0 forks source link

`integration-test-rock.yaml` fails with "write /dev/stdout: no space left on device" #39

Closed orfeas-k closed 8 months ago

orfeas-k commented 8 months ago

Bug Description

As seen in this mlserver-huggingface run, the action fails with write /dev/stdout: no space left on device. The built rock image is a pretty big one (3.7GB), thus my understanding is that the action fails during this part of the command

docker save $DOCKER_IMAGE > $DOCKER_IMAGE.tar && sudo microk8s ctr image import $DOCKER_IMAGE.tar --digests=true

since docker save requires approximately another time the size of the .rock file in space.

Solution

We could:

  1. use easimon/maximize-build-space@v7 action as we do in scan-rock.yaml
  2. delete the .rock file after export rock to docker step, since it's no longer needed
  3. both of the above.

To Reproduce

Rerun https://github.com/canonical/seldonio-rocks/actions/runs/7210894339/job/19680829633#step:7:20.

Environment

Github runner

Relevant Log Output

Run if [[ "3.8" ]]
  if [[ "3.8" ]]
  then
    python3.8 -m tox -e integration
  else
    tox -e integration
  fi
  shell: /usr/bin/bash -e {0}
  env:
    CONTROLLER_NAME: github-pr-2db3c-microk8s
integration: install_deps> python -I -m pip install 'juju<4.0' ops pytest pytest-operator
integration: freeze> python -m pip freeze --all
integration: asttokens==2.4.1,backcall==0.2.0,bcrypt==4.1.1,cachetools==5.3.2,certifi==2023.11.17,cffi==1.16.0,charset-normalizer==3.3.2,cryptography==41.0.7,decorator==5.1.1,exceptiongroup==1.2.0,executing==2.0.1,google-auth==2.25.2,hvac==2.0.0,idna==3.6,iniconfig==2.0.0,ipdb==0.13.13,ipython==8.12.3,jedi==0.19.1,Jinja2==3.1.2,juju==3.3.0.0,kubernetes==28.1.0,macaroonbakery==1.3.4,MarkupSafe==2.1.3,matplotlib-inline==0.1.6,mypy-extensions==1.0.0,oauthlib==3.2.2,ops==2.9.0,packaging==23.2,paramiko==2.12.0,parso==0.8.3,pexpect==4.9.0,pickleshare==0.7.5,pip==23.3.1,pluggy==1.3.0,prompt-toolkit==3.0.43,protobuf==4.25.1,ptyprocess==0.7.0,pure-eval==0.2.2,pyasn1==0.5.1,pyasn1-modules==0.3.0,pycparser==2.21,Pygments==2.17.2,pymacaroons==0.13.0,PyNaCl==1.5.0,pyRFC3339==1.1,pytest==7.4.3,pytest-asyncio==0.21.1,pytest-operator==0.31.1,python-dateutil==2.8.2,pytz==2023.3.post1,PyYAML==6.0.1,requests==2.31.0,requests-oauthlib==1.3.1,rsa==4.9,setuptools==69.0.2,six==1.16.0,stack-data==0.6.3,tomli==2.0.1,toposort==1.10,traitlets==5.14.0,typing-inspect==0.9.0,typing_extensions==4.9.0,urllib3==1.26.18,wcwidth==0.2.12,websocket-client==1.7.0,websockets==8.1,wheel==0.42.0
integration: commands[0]> rm -rf charm_repo
integration: commands[1]> git clone --branch main https://github.com/canonical/seldon-core-operator.git charm_repo
Cloning into 'charm_repo'...
integration: commands[2]> sed -i 's/namespace: {{ namespace }}/namespace: YQ_SAFE/' charm_repo/src/templates/configmap.yaml.j2
integration: commands[3]> bash -c 'NAME=$(yq eval .name rockcraft.yaml) && VERSION=$(yq eval .version rockcraft.yaml) && DOCKER_IMAGE=$NAME:$VERSION && \docker save $DOCKER_IMAGE > $DOCKER_IMAGE.tar && sudo microk8s ctr image import $DOCKER_IMAGE.tar --digests=true && predictor_servers=$(yq e ".data.predictor_servers" charm_repo/src/templates/configmap.yaml.j2) && predictor_servers=$(jq --arg jq_name $NAME -r '"'"'.HUGGINGFACE_SERVER.protocols.v2.image=$jq_name'"'"' <<< $predictor_servers) && predictor_servers=$(jq --arg jq_version $VERSION -r '"'"'.HUGGINGFACE_SERVER.protocols.v2.defaultImageVersion=$jq_version'"'"' <<< $predictor_servers) yq e -i ".data.predictor_servers=strenv(predictor_servers)" charm_repo/src/templates/configmap.yaml.j2'
write /dev/stdout: no space left on device
integration: exit 1 (94.88 seconds) /home/runner/work/seldonio-rocks/seldonio-rocks/mlserver-huggingface> bash -c 'NAME=$(yq eval .name rockcraft.yaml) && VERSION=$(yq eval .version rockcraft.yaml) && DOCKER_IMAGE=$NAME:$VERSION && \docker save $DOCKER_IMAGE > $DOCKER_IMAGE.tar && sudo microk8s ctr image import $DOCKER_IMAGE.tar --digests=true && predictor_servers=$(yq e ".data.predictor_servers" charm_repo/src/templates/configmap.yaml.j2) && predictor_servers=$(jq --arg jq_name $NAME -r '"'"'.HUGGINGFACE_SERVER.protocols.v2.image=$jq_name'"'"' <<< $predictor_servers) && predictor_servers=$(jq --arg jq_version $VERSION -r '"'"'.HUGGINGFACE_SERVER.protocols.v2.defaultImageVersion=$jq_version'"'"' <<< $predictor_servers) yq e -i ".data.predictor_servers=strenv(predictor_servers)" charm_repo/src/templates/configmap.yaml.j2' pid=18791
  integration: FAIL code 1 (106.85=setup[11.66]+cmd[0.00,0.30,0.00,94.88] seconds)
  evaluation failed :( (106.98 seconds)
Error: Process completed with exit code 1.

Additional Context

No response

syncronize-issues-to-jira[bot] commented 8 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5180.

This message was autogenerated

orfeas-k commented 8 months ago

Tried implementing (2) option only but didn't change much for mlserver-huggingface https://github.com/canonical/seldonio-rocks/actions/runs/7459902457/job/20297948981?pr=66#step:6:2. It still failed with "write /dev/stdout: no space left on device"

orfeas-k commented 8 months ago

Implementing both seems to work as we see https://github.com/canonical/seldonio-rocks/actions/runs/7460461633/job/20299668263?pr=66#step:9:20.