NVIDIA / earth2studio

Open-source deep-learning framework for exploring, building and deploying AI weather/climate workflows.
https://nvidia.github.io/earth2studio/
Apache License 2.0
73 stars 23 forks source link

🐛[BUG]: Pytest getting killed on SFNO package test with wheel install #68

Closed NickGeneva closed 2 months ago

NickGeneva commented 2 months ago

Version

main

On which installation method(s) does this occur?

Pip

Describe the issue

Getting the classic uninformative:

test/models/px/test_sfno.py::test_sfno_package[cpu] make: *** [Makefile:40: pytest] Killed

Likely a memory issue. Some research suggests pytest can cause memory usage to compound between tests... One potential solution is to change the models loaded to a fixture to load the models scoped to something, seems tricky problem with these large models.

NickGeneva commented 2 months ago

Fixed by adding explicit memory request in helm chart of test container:

resources:
  requests:
      nvidia.com/gpu: 1
      memory: "64Gi"
  limits:
      nvidia.com/gpu: 1
      memory: "64Gi"