-
Hi, I followed the instruction from this [post](https://docs.habana.ai/en/latest/AWS_EC2_DL1_and_PyTorch_Quick_Start/AWS_EC2_DL1_and_PyTorch_Quick_Start.html#start-training-a-pytorch-model-on-gaudi). …
-
**What happened**:
start 0.48.1 I start to get watchdog: BUG: soft lockup sometimes directly after VM creation.
I did a couple of tests 0.47.1 works well and it starts to fail from 0.48.1
my susp…
-
### Bug description
I ran the code well on single HPU, however it went wrong when I used more than one HPUs.
### How to reproduce the bug
```python
import torch
import torchvision
import t…
-
## ❓ Questions and Help
I am getting this error after following the instructions on installing habana for pytorch and running my script.
- OS: Ubuntu 20.04
- Packaging: Pip
Anyone has an idea on…
-
we run gang scheduling job with high-priority but we dont see that the default priory jobs releasing once we don't enough resources.
expected:
we expect that in such cases lower priority jobs are…
-
# Hi,
I got new servers and seems like VM creation fail but i can't understand the problem from the log. i have ~300 servers and on all of them all working good. is not something that I got before s…
-
Have an import error when initializing GaudiTrainingArguments in the text-classification.ipynb.
-
# Summary
Add support for at least inference on greco:
https://habana.ai/inference/greco/
# Problem statement
As today, there does nt seem to be any connection/wrappers between the synapseAI sdk…
-
I use batch size = 32 or 64, gaudi is occupied all 32gb/32gb. Is it the mechanism of gaudi and Habana ?
So that 32gb does not mean any thing?
-
- I set world size = 4, and habana auto selects 4 last ones.
- But I can not use the gaudi index from 0 to 3 in another process
![image](https://user-images.githubusercontent.com/81377071/163561386-…